Overview
Plant disease classification CNNs are typically hosted on cloud servers — requiring internet connectivity, introducing latency, and exposing sensitive crop data. Edge computing moves inference to the device itself, but microcontrollers have severe memory constraints that most CNN architectures cannot satisfy.
Chroma-Sense is a novel architecture that classifies plant leaf diseases directly on the camera, running at real-time speeds on devices with as little as 256 KB of heap memory and 128 KB of flash — without any cloud connection.
⚠ The Problem
- Cloud CNNs require internet — unreliable in rural fields
- Network latency prevents real-time response
- Security risk: proprietary crop data leaves the farm
- Mobile architectures (MobileNet, EfficientNet) still too heavy for smallest devices
✓ Chroma-Sense Solution
- Runs entirely on-device — no internet needed
- 8.4 FPS on OpenMV H7 (256 KB heap, 128 KB flash)
- Only 15K parameters, 54 KB quantized model
- Deploys on drones, cameras, IoT nodes in the field
Core Insight
Plant disease symptoms — spots, rings, patches, mosaics, stripes — are semantically simple. More importantly, the same types of shapes appear in each individual RGB channel, just with different brightness distributions.
Per-Channel Anatomy of Apple Cedar Rust
Each channel contains a simple spot or ring — not the complex multi-color concentric pattern of the full image. A small CNN can learn to detect these simple shapes with very few feature maps. And since the same shape types appear in every channel, the same feature extractor works for all three.
Serial Multi-Channel Processing
Sharing the extractor slashes parameter count (no 3× duplication). Processing sequentially means only one channel's activations ever live in RAM — cutting peak memory by ~2/3 vs. parallel multi-channel approaches.
Why GlobalMaxPooling? Diseases appear randomly — a leaf may have 2 spots or 20. Average pooling produces a score of 0.2 for a sparse leaf vs. 1.0 for a dense one, causing misclassifications. Max pooling always returns 1.0 if any spot matches the filter, regardless of count. Critical after Int8 quantization, which amplifies this variance.
Memory Efficiency
Sequential processing cuts CNN width by ~2/3, reducing peak RAM 25% vs. MobileNetV3.
Fewer Parameters
Shared extractor across all channels: 15K params vs. 97K in the closest competitor.
Better Features
Grayscale-per-channel training forces shape/texture learning over color shortcuts.
Explainability
Channel-specific feature maps are interpretable — Grad-CAM highlights disease zones clearly.
Architecture
The SeparableConv2D layers follow MobileNet's depthwise-separable factorization — separating spatial filtering from channel mixing — cutting compute and parameters significantly. At just 32–48 feature maps per layer, the linear bottleneck / inverted residual blocks used by MobileNetV2 offer no benefit and were omitted.
Code Guide
Feature Extractor
def build_feature_extractor(input_shape=(96, 96, 1)): inp = Input(shape=input_shape) x = Conv2D(8, (3,3), padding='same', activation='relu')(inp) x = MaxPooling2D((2,2))(x) for filters in [24, 48, 48, 48]: x = SeparableConv2D(filters, (3,3), padding='same', activation='relu')(x) x = MaxPooling2D((2,2))(x) return Model(inp, x) # 7,352 params — reused 3×
Full Chroma-Sense Model
def build_chroma_sense(num_classes): fe = build_feature_extractor() # shared weights rgb = Input(shape=(96, 96, 3)) r = Lambda(lambda x: x[:,:,:,0:1])(rgb) g = Lambda(lambda x: x[:,:,:,1:2])(rgb) b = Lambda(lambda x: x[:,:,:,2:3])(rgb) rf, gf, bf = fe(r), fe(g), fe(b) # sequential, same extractor x = Concatenate()([rf, gf, bf]) # (3, 3, 144) x = Conv2D(48, (1,1), activation='relu')(x) # cross-channel attention x = Conv2D(16, (1,1), activation='relu')(x) x = GlobalMaxPooling2D()(x) out = Dense(num_classes, activation='softmax')(x) return Model(rgb, out)
Int8 Quantization (TFLite)
converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = converter.inference_output_type = tf.int8 def representative_dataset(): for imgs, _ in val_dataset.take(100): yield [imgs] converter.representative_dataset = representative_dataset open('trained.tflite', 'wb').write(converter.convert()) # → 54 KB
On-Device Inference (OpenMV / MicroPython)
import sensor, tf, uos, gc, time sensor.reset(); sensor.set_pixformat(sensor.RGB565) sensor.set_framesize(sensor.QVGA); sensor.set_windowing((240, 240)) sensor.skip_frames(time=2000) # auto-exposure settle labels = open("/labels.txt").read().splitlines() net = tf.load("/trained", load_to_fb=uos.stat('/trained')[6] > gc.mem_free()-65536) clock = time.clock() while True: clock.tick() img = sensor.snapshot() pred = sorted(zip(labels, net.classify(img)[0].output()), key=lambda x: -x[1]) print(f"{pred[0][0]} ({pred[0][1]:.0%}) — {clock.fps():.1f} FPS")
Training
Hyperparameters
| Input size | 96 × 96 × 3 |
| Batch size | 128 |
| Epochs (initial) | 250 |
| Learning rate | 0.0005 |
| Optimizer | Adam |
| Loss | Categorical Crossentropy |
| Fine-tune epochs | 10 |
| Fine-tune LR | 0.000045 |
| Layers unfrozen | Final 65% |
| Hardware | NVIDIA Tesla T4 |
Dataset
- Source: PlantVillage (Kaggle)
- Plants: Apple, Tomato, Grape, Corn
- Total: ~18,000 images, 18 classes
- Split: 80% train / 20% validation
- Resolution: 96×96×3 (cropped)
- Preprocessing: Background removed — only disease semantics remain as consistent features
Final (Apple): Training 98.08% · Validation 92.18% · Loss 0.2587
Results
F1 Scores by Plant & Disease (Int8)
Apple Subset — Validation & Test
| Split | Format | Accuracy | ROC-AUC | Weighted F1 | Loss |
|---|---|---|---|---|---|
| Validation | Int8 | 92.18% | 0.9909 | 0.922 | — |
| Validation | Float32 | 92.02% | 0.9920 | 0.920 | — |
| Test (815 samples) | Int8 | 91.78% | 0.9849 | 0.9177 | 0.5916 |
| Test (815 samples) | Float32 | 91.53% | 0.9870 | 0.9153 | 0.3259 |
Edge Device Deployment
Flash Pre-built Firmware
# Flash with Edge Impulse CLI edge-impulse-flash-tool --firmware edge_impulse_firmware_openmv_cam_h7.bin # Copy these to device flash: # trained.tflite → /trained # labels.txt → /labels.txt # ei_image_classification.py → main.py # Reset device — classifies at boot, no setup needed
Edge Impulse Studio: Upload your dataset at edgeimpulse.com, configure an impulse with Image (96×96 RGB) → Chroma-Sense → Classification, train in-browser, and deploy with edge-impulse-run-impulse.
Comparison with Prior Work
All baselines were built at 3× the feature extractor width of Chroma-Sense to match their typical simultaneous R+G+B processing. Tested on the same dataset for 30 epochs with batch normalization. ❌ = model exceeded flash or heap; could not run on device.
| Model | Apple Acc | Tomato Acc | RAM | Flash | Params | MADDs | H7 | H7+ | Nicla |
|---|---|---|---|---|---|---|---|---|---|
| Conv2D | 95.2 | 88.8 | 336 KB | 502 KB | 491K | 130M | ❌ | 1.7 | ❌ |
| Grouped Conv | 86.4 | 71.7 | 336 KB | 187 KB | 169K | 43M | ❌ | 4.5 | ❌ |
| MobileNet | 86.7 | 86.6 | 341 KB | 99 KB | 66K | 21M | ❌ | 5.3 | ❌ |
| MobileNetV2 | 95.4 | 90.4 | 271 KB | 203 KB | 153K | 25M | ❌ | 7.0 | ❌ |
| MobileNetV3 | 92.8 | 92.6 | 216 KB | 138 KB | 97K | 20M | ❌ | 5.3 | 7.6 |
| EfficientNetV2 | 91.8 | 88.6 | 240 KB | 160 KB | 110K | 27M | ❌ | 5.1 | 6.9 |
| SqueezeNet | 97.2 | 83.5 | 338 KB | 80 KB | 56K | 23M | ❌ | 6.4 | ❌ |
| ShuffleNet | 90.2 | 83.0 | 347 KB | 129 KB | 88K | 13M | ❌ | 2.3 | ❌ |
| Squ. & Excitation | 82.2 | 79.1 | 149 KB | 550 KB | 527K | 32M | ❌ | 5.6 | ❌ |
| Multi-Ch. CNN | 88.8 | 86.1 | 160 KB | 87 KB | 62K | 43M | 4.7 | 4.7 | 4.0 |
| Chroma-Sense | 93.1 | 89.3 | 160 KB | 54 KB | 15K | 8.3M | 8.4 | 8.4 | 7.2 |
Citation
Related: SprayCraft — Graph-Based Route Optimization for Variable Rate Precision Spraying arXiv:2412.12176