SpectrumWebCo

Your Cart

Your cart is empty

Looks like you haven't added any software to your cart yet.


High-Performance Object Storage for ML: MinIO Integration in Kled.io
General

High-Performance Object Storage for ML: MinIO Integration in Kled.io

Daniel Park

Daniel Park

January 15, 2025 (2mo ago)

<h2>Introduction</h2> <p>Data is the foundation of machine learning. From raw datasets to processed features, trained models to evaluation metrics, ML workflows generate and consume vast amounts of data. Managing this data efficiently requires a storage solution that is performant, scalable, and seamlessly integrated into the ML lifecycle.</p> <p>In the Kled.io platform, we've integrated MinIO—a high-performance, S3-compatible object storage system—to provide ML teams with a robust solution for their data management needs. This article explores how MinIO integration in Kled.io enables efficient storage and retrieval of ML assets throughout the model development lifecycle.</p> <h2>The Storage Challenge in ML</h2> <p>Machine learning workflows have unique storage requirements:</p> <ul> <li><strong>Scale</strong>: Datasets can range from gigabytes to terabytes or even petabytes</li> <li><strong>Access patterns</strong>: Frequent reads during training, occasional writes during preprocessing</li> <li><strong>Versioning</strong>: Need to track multiple versions of datasets and models</li> <li><strong>Performance</strong>: Fast access to optimize training throughput</li> <li><strong>Compatibility</strong>: Seamless integration with various ML frameworks and tools</li> <li><strong>Data governance</strong>: Compliance with security and privacy requirements</li> </ul> <p>Traditional file systems and databases often struggle with these requirements, particularly at scale. Cloud storage solutions like Amazon S3 provide excellent scalability but can introduce latency, vendor lock-in, and significant costs.</p> <h2>What is MinIO?</h2> <p>MinIO is an open-source, high-performance object storage system designed for cloud-native applications. Its key features include:</p> <ul> <li><strong>S3 compatibility</strong>: Full implementation of the Amazon S3 API</li> <li><strong>High performance</strong>: Optimized for large-scale data workloads</li> <li><strong>Distributed architecture</strong>: Horizontal scaling across multiple nodes</li> <li><strong>Erasure coding</strong>: Data protection with minimal storage overhead</li> <li><strong>Bitrot protection</strong>: Automatic data integrity checking</li> <li><strong>Encryption</strong>: Data security at rest and in transit</li> <li><strong>WORM</strong>: Write-Once-Read-Many support for regulatory compliance</li> </ul> <p>MinIO can be deployed on-premises, in the cloud, or in hybrid environments, providing flexibility and consistent performance regardless of deployment model.</p> <h2>MinIO Integration in Kled.io</h2> <p>Kled.io's MinIO integration is specifically tailored for ML workflows:</p> <h3>1. Managed MinIO Clusters</h3> <p>We provide fully managed MinIO clusters optimized for ML workloads:</p> <p><img src="https://images.unsplash.com/photo-1562577309-4932fdd64cd1?q=80&#x26;w=2070&#x26;auto=format&#x26;fit=crop&#x26;ixlib=rb-4.0.3&#x26;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D" alt="MinIO Dashboard in Kled.io"></p> <h3>2. ML-Centric Bucket Organization</h3> <p>Our platform establishes a standardized bucket structure for ML assets:</p> <pre><code>├── raw-data/ # Original, unprocessed datasets │ ├── tabular/ │ ├── images/ │ └── text/ ├── processed-data/ # Cleaned and transformed datasets │ ├── features/ │ ├── train-test-splits/ │ └── embeddings/ ├── models/ # Trained model artifacts │ ├── checkpoints/ │ ├── final/ │ └── versioned/ └── experiments/ # Experiment results and metrics ├── runs/ ├── visualizations/ └── evaluation/ </code></pre> <h3>3. Framework Integration</h3> <p>MinIO in Kled.io seamlessly integrates with popular ML frameworks:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># PyTorch Example with MinIO</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> torch</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> torch</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">utils</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">data </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> Dataset</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> io</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> get_minio_client</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">class</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> S3Dataset</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">(</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">Dataset</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">):</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583"> def</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> __init__</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">(</span><span style="--shiki-light:#FF9800;--shiki-dark:#FF9800">self</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span><span style="--shiki-light:#FF9800;--shiki-dark:#FF9800"> bucket_name</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span><span style="--shiki-light:#FF9800;--shiki-dark:#FF9800"> prefix</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">):</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> self</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">minio_client </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> get_minio_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> self</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">bucket </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> bucket_name</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> self</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">keys </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> []</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> </span></span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # List all objects with given prefix</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> objects </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> self</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">minio_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">list_objects</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(bucket_name, prefix</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">prefix, recursive</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#1976D2;--shiki-dark:#79B8FF">True</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> self</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">keys </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> [obj</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">object_name </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">for</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> obj </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">in</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> objects]</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> </span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583"> def</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> __len__</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">(</span><span style="--shiki-light:#FF9800;--shiki-dark:#FF9800">self</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">):</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583"> return</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> len</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(self.keys)</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> </span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583"> def</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> __getitem__</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">(</span><span style="--shiki-light:#FF9800;--shiki-dark:#FF9800">self</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span><span style="--shiki-light:#FF9800;--shiki-dark:#FF9800"> idx</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">):</span></span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Retrieve object from MinIO</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> response </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> self</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">minio_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">get_object</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(self.bucket, self.keys[idx])</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> data </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> torch</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">load</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(io.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">BytesIO</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(response.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">read</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()))</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583"> return</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> data</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Use the dataset</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">train_dataset </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> S3Dataset</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">'processed-data'</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">'features/customer_churn/'</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">train_loader </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> torch</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">utils</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">data</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">DataLoader</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(train_dataset, batch_size</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">64</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h3>4. Versioning and Lifecycle Management</h3> <p>Kled.io enhances MinIO with ML-specific versioning capabilities:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> DatasetManager</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Store dataset with versioning</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">dataset_manager </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> DatasetManager</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">dataset_manager</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">save_dataset</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"customer_churn"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> version</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"1.2.0"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> files</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">[</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"features.parquet"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"labels.parquet"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">],</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> metadata</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">{</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "description"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"Customer churn prediction dataset"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "preprocessing"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"standard_scaler"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "feature_count"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">48</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> }</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Retrieve a specific version</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">dataset </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> dataset_manager</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">load_dataset</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"customer_churn"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> version</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"1.2.0"</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h2>Real-World Example: Computer Vision Pipeline</h2> <p>Let's explore a practical example of using MinIO in Kled.io for a computer vision workflow:</p> <h3>1. Data Ingestion</h3> <p>The ML team starts by uploading a large image dataset to MinIO:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> FileUploader</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Upload a directory of images with metadata</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">uploader </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> FileUploader</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">upload_task </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> uploader</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">upload_directory</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> local_path</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"/local/path/to/retail_images"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> bucket</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"raw-data"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> remote_path</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"images/retail_products/"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> metadata</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">{</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "source"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"in-store-cameras"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "collection_date"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"2025-01-05"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "image_count"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">25000</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> }</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Track upload progress</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">for</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> progress </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">in</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> upload_task</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">progress</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">():</span></span> <span data-line=""><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> print</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">f</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"Uploaded </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">{</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">progress.uploaded</span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">}</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">/</span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">{</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">progress.total</span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">}</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> files"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h3>2. Preprocessing with Parallel Access</h3> <p>The team then processes these images using distributed workers:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">compute </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> KledJob</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> get_minio_client</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">def</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> preprocess_partition</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">(</span><span style="--shiki-light:#FF9800;--shiki-dark:#FF9800">partition_id</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span><span style="--shiki-light:#FF9800;--shiki-dark:#FF9800"> total_partitions</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">):</span></span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> """Process a subset of the images in parallel."""</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> minio_client </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> get_minio_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> objects </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> minio_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">list_objects</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> 'raw-data'</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> prefix</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">'images/retail_products/'</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> recursive</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#1976D2;--shiki-dark:#79B8FF">True</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> )</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> </span></span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Distribute work across partitions</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> partition_objects </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> [obj </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">for</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> idx</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> obj </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">in</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> enumerate</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(objects)</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> </span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583"> if</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> idx </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">%</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> total_partitions </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">==</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> partition_id]</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> </span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583"> for</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> obj </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">in</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> partition_objects</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">:</span></span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Retrieve image</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> response </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> minio_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">get_object</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">'raw-data'</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, obj.object_name)</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> image_data </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> response</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">read</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> </span></span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Process image...</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> </span></span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Save processed image</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> processed_path </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> obj</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">object_name</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">replace</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">'retail_products'</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">'retail_products_processed'</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> minio_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">put_object</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> 'processed-data'</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> processed_path, </span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> processed_image_data,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> length</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">len</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(processed_image_data)</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> )</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Launch 10 parallel jobs</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">jobs </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> []</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">for</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> i </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">in</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> range</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">10</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">):</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> job </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> KledJob</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> preprocess_partition,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> args</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(i, </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">10</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">),</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> resources</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">{</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"cpu"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">4</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"memory"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"16Gi"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">}</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> )</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> jobs</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">append</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(job.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">submit</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">())</span></span></code></pre></figure> <h3>3. Model Training with Efficient Data Access</h3> <p>The model training process streams data directly from MinIO:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">ml </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> KledTrainer</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> DatasetManager</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Configure dataset</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">dataset_manager </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> DatasetManager</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">dataset </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> dataset_manager</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">create_dataset</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"retail_products"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> bucket</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"processed-data"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> path</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"images/retail_products_processed"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> format</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"image"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> partition_strategy</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"shard"</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Configure and launch training</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">trainer </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> KledTrainer</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> framework</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"pytorch"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> model_type</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"efficientnet-b3"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> dataset</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">dataset,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> output_bucket</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"models"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> output_path</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"retail/product_classifier"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> hyperparameters</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">{</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "learning_rate"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">1e-4</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "batch_size"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">32</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "epochs"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">10</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> }</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">training_job </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> trainer</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">train</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span></code></pre></figure> <h3>4. Model Serving with Versioning</h3> <p>Finally, the trained model is stored with versioning and deployed:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">ml </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> ModelRegistry</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">serving </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> ModelServer</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Store model with version information</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">model_registry </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> ModelRegistry</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">model_version </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> model_registry</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">register_model</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"retail_product_classifier"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> version</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"1.0.0"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> artifacts_path</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"models/retail/product_classifier"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> metrics</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">{</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "accuracy"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">0.92</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "precision"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">0.89</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "recall"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">0.94</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> },</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> metadata</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">{</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "model_type"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"efficientnet-b3"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "training_dataset"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"retail_products@1.0.0"</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> }</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Deploy model</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">server </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> ModelServer</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">deployment </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> server</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">deploy_model</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> model_version</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">model_version,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> replicas</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">3</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> resources</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">{</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"cpu"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">2</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"memory"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"4Gi"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"gpu"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">1</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">}</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h2>Performance Optimization for ML Workloads</h2> <p>Kled.io implements several optimizations for ML storage patterns:</p> <h3>1. Read-Ahead Buffering</h3> <p>ML workloads often perform sequential reads of large files. Our MinIO integration includes intelligent read-ahead buffering:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Configure read-ahead buffer size based on data characteristics</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">configure</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> read_ahead_buffer_size</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"64MB"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Optimized for large sequential reads</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> read_ahead_concurrency</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">4</span><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Number of concurrent read-ahead operations</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h3>2. Data Locality</h3> <p>The platform schedules compute jobs close to data:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Automatic data locality</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">training_job </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">ml</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">TrainingJob</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> script</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"train.py"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> data_locality</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#1976D2;--shiki-dark:#79B8FF">True</span><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Schedule job on nodes with local data access</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h3>3. Tiered Storage</h3> <p>We implement tiered storage for cost optimization:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="yaml" data-theme="min-light min-dark"><code data-language="yaml" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Storage class configuration</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8">storage_tiers</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> hot</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> type</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "ssd"</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> retention_policy</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "30 days"</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> warm</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> type</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "hdd"</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> retention_policy</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "180 days"</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> cold</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> type</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "archive"</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F8F8F8"> retrieval_time</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">:</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "hours"</span></span></code></pre></figure> <h3>4. Bandwidth Throttling</h3> <p>Prevent storage operations from interfering with training:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Configure bandwidth limits</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">configure</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> upload_bandwidth_limit</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"500MB/s"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">, </span><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Limit upload bandwidth</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> download_bandwidth_limit</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"2GB/s"</span><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Higher limit for downloads during training</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h2>Benefits Realized</h2> <p>ML teams using MinIO in Kled.io have reported significant improvements:</p> <ol> <li><strong>Reduced data loading time by 65%</strong>: Compared to network-attached storage</li> <li><strong>Storage costs reduced by 40%</strong>: Compared to equivalent cloud storage</li> <li><strong>99.999% availability</strong>: Mission-critical reliability for ML pipelines</li> <li><strong>Up to 90% reduction in data transfer costs</strong>: By keeping data and compute together</li> <li><strong>Simplified compliance</strong>: Built-in data governance features</li> </ol> <h2>Best Practices for ML Storage with MinIO</h2> <p>Based on our experience, we recommend these best practices:</p> <h3>1. Organize Data by Access Pattern</h3> <p>Structure your buckets based on how data is accessed:</p> <pre><code># For frequent, high-throughput access (training data) bucket: hot-training-data policy: CACHE_RECENT

For occasional access with high durability (models)

bucket: models-archive policy: VERSIONED_DURABILITY </code></pre>

<h3>2. Use Intelligent Batching</h3> <p>Optimize for the small file problem in ML:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Batch small files for better performance</span></span> <span data-line=""><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">from</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> kled</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">import</span><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0"> FileOptimizer</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">optimizer </span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0"> FileOptimizer</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">()</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">optimizer</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">batch_small_files</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> bucket</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"raw-data"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> prefix</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"logs/"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> target_size</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"128MB"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> format</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"parquet"</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h3>3. Implement Data Compression</h3> <p>Use appropriate compression for different data types:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Compress based on data type</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">put_object</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> bucket_name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"processed-data"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> object_name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"embeddings/customer_vectors.npy"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> data</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">file_data,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> length</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">length,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> compression</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"zstd"</span><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Good for numeric data</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span> <span data-line=""> </span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">put_object</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> bucket_name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"raw-data"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> object_name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"text/articles.json"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> data</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">file_data,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> length</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">length,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> compression</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"zlib"</span><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"> # Good for text data</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h3>4. Use Metadata for Searchability</h3> <p>Enrich your data with metadata for easier discovery:</p> <figure data-rehype-pretty-code-figure=""><pre tabindex="0" data-language="python" data-theme="min-light min-dark"><code data-language="python" data-theme="min-light min-dark" style="display: grid;"><span data-line=""><span style="--shiki-light:#C2C3C5;--shiki-dark:#6B737C"># Add rich metadata for searchability</span></span> <span data-line=""><span style="--shiki-light:#24292EFF;--shiki-dark:#B392F0">storage_client</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">.</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">put_object</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> bucket_name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"models"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> object_name</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"recommenders/collaborative_filter_v2.pkl"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> data</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">model_data,</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> length</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#6F42C1;--shiki-dark:#B392F0">len</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">(model_data),</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> metadata</span><span style="--shiki-light:#D32F2F;--shiki-dark:#F97583">=</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">{</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "algorithm"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"matrix_factorization"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "dimensions"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">128</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "training_dataset"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"user_interactions_jan2025"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "accuracy"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#1976D2;--shiki-dark:#F8F8F8">0.87</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">,</span></span> <span data-line=""><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70"> "owner"</span><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">: </span><span style="--shiki-light:#22863A;--shiki-dark:#FFAB70">"recommendation_team"</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB"> }</span></span> <span data-line=""><span style="--shiki-light:#212121;--shiki-dark:#BBBBBB">)</span></span></code></pre></figure> <h2>Ethical Considerations</h2> <p>When implementing object storage for ML, consider these ethical aspects:</p> <ul> <li><strong>Data retention</strong>: Establish clear policies for how long data is kept</li> <li><strong>Cost transparency</strong>: Make storage costs visible to encourage efficient usage</li> <li><strong>Environmental impact</strong>: Balance performance needs with energy consumption</li> <li><strong>Data sovereignty</strong>: Respect geographic restrictions on data storage</li> </ul> <h2>Conclusion</h2> <p>MinIO integration in Kled.io provides ML teams with a powerful, S3-compatible object storage solution that meets the unique requirements of machine learning workflows. By offering high performance, scalability, and seamless integration with ML frameworks, MinIO enables teams to focus on model development rather than storage management challenges.</p> <p>Future enhancements to our MinIO integration will include:</p> <ul> <li>Automated data quality validation during ingestion</li> <li>ML-specific data cataloging and discovery</li> <li>Enhanced dataset versioning with lineage tracking</li> <li>Intelligent caching based on usage patterns</li> </ul> <p>As machine learning datasets continue to grow in size and complexity, robust object storage becomes increasingly essential. MinIO's high-performance architecture, combined with Kled.io's ML-specific enhancements, provides a foundation for scaling ML operations efficiently.</p> <blockquote> <p>"Storage isn't just about keeping data—it's about making data accessible at the speed of innovation."</p> </blockquote> <p>For more information on MinIO integration in Kled.io, visit our <a href="https://kled.io/docs/storage/minio">storage documentation</a>.</p>