IEEE CAI 2025  ·  Smart Agriculture  ·  TinyML

Chroma-Sense

A Memory-Efficient Plant Leaf Disease Classification Model for Resource-Constrained Edge Devices

University of North Texas  ·  Dept. of Computer Science and Engineering

25%
Less RAM
60%
Less Flash
15K
Parameters
8.4
FPS · OpenMV H7
92%
Val. Accuracy
8.3M
MADDs
01

Overview

Plant disease classification CNNs are typically hosted on cloud servers — requiring internet connectivity, introducing latency, and exposing sensitive crop data. Edge computing moves inference to the device itself, but microcontrollers have severe memory constraints that most CNN architectures cannot satisfy.

Chroma-Sense is a novel architecture that classifies plant leaf diseases directly on the camera, running at real-time speeds on devices with as little as 256 KB of heap memory and 128 KB of flash — without any cloud connection.

⚠ The Problem

  • Cloud CNNs require internet — unreliable in rural fields
  • Network latency prevents real-time response
  • Security risk: proprietary crop data leaves the farm
  • Mobile architectures (MobileNet, EfficientNet) still too heavy for smallest devices

✓ Chroma-Sense Solution

  • Runs entirely on-device — no internet needed
  • 8.4 FPS on OpenMV H7 (256 KB heap, 128 KB flash)
  • Only 15K parameters, 54 KB quantized model
  • Deploys on drones, cameras, IoT nodes in the field
02

Core Insight

Plant disease symptoms — spots, rings, patches, mosaics, stripes — are semantically simple. More importantly, the same types of shapes appear in each individual RGB channel, just with different brightness distributions.

Per-Channel Anatomy of Apple Cedar Rust

🍎
RGB (full)
Orange/yellow spot, red ring, black center — complex combined pattern
R
Red channel
Bright center (yellow = R+G) and bright red ring both visible
G
Green channel
Only the yellow center is bright — red ring is absent
B
Blue channel
Only the black center spot is visible — everything else is dark

Each channel contains a simple spot or ring — not the complex multi-color concentric pattern of the full image. A small CNN can learn to detect these simple shapes with very few feature maps. And since the same shape types appear in every channel, the same feature extractor works for all three.

Serial Multi-Channel Processing

RGB Input96×96×3
SplitR / G / B
Feature Extractorshared weights
Concatenate3×3×144
1×1 Convcross-channel fusion
GlobalMaxPool→ Softmax

Sharing the extractor slashes parameter count (no 3× duplication). Processing sequentially means only one channel's activations ever live in RAM — cutting peak memory by ~2/3 vs. parallel multi-channel approaches.

💡

Why GlobalMaxPooling? Diseases appear randomly — a leaf may have 2 spots or 20. Average pooling produces a score of 0.2 for a sparse leaf vs. 1.0 for a dense one, causing misclassifications. Max pooling always returns 1.0 if any spot matches the filter, regardless of count. Critical after Int8 quantization, which amplifies this variance.

🧠

Memory Efficiency

Sequential processing cuts CNN width by ~2/3, reducing peak RAM 25% vs. MobileNetV3.

Fewer Parameters

Shared extractor across all channels: 15K params vs. 97K in the closest competitor.

🔍

Better Features

Grayscale-per-channel training forces shape/texture learning over color shortcuts.

🗺

Explainability

Channel-specific feature maps are interpretable — Grad-CAM highlights disease zones clearly.

03

Architecture

Feature Extractor — shared across R, G, B
InputLayer(96, 96, 1)
Conv2D 3×3(96, 96, 8)80
MaxPooling 2×2(48, 48, 8)
SeparableConv2D(48, 48, 24)228
MaxPooling 2×2(24, 24, 24)
SeparableConv2D(24, 24, 48)1,416
MaxPooling 2×2(12, 12, 48)
SeparableConv2D(12, 12, 48)2,746
MaxPooling 2×2(6, 6, 48)
SeparableConv2D(6, 6, 48)2,746
MaxPooling 2×2(3, 3, 48)
Total params7,352
Classification Head
Input(96, 96, 3)
Lambda ×3(96, 96, 1) each
Extractor ×3(3, 3, 48) eachshared
Concatenate(3, 3, 144)
Conv2D 1×1(3, 3, 48)~6,960
Conv2D 1×1(3, 3, 16)~784
GlobalMaxPool2D(16,)
Dense + Softmax(N classes,)~68
Total params (Apple)~15,164

The SeparableConv2D layers follow MobileNet's depthwise-separable factorization — separating spatial filtering from channel mixing — cutting compute and parameters significantly. At just 32–48 feature maps per layer, the linear bottleneck / inverted residual blocks used by MobileNetV2 offer no benefit and were omitted.

04

Code Guide

Feature Extractor

CAI.ipynb
def build_feature_extractor(input_shape=(96, 96, 1)):
    inp = Input(shape=input_shape)
    x = Conv2D(8, (3,3), padding='same', activation='relu')(inp)
    x = MaxPooling2D((2,2))(x)
    for filters in [24, 48, 48, 48]:
        x = SeparableConv2D(filters, (3,3), padding='same', activation='relu')(x)
        x = MaxPooling2D((2,2))(x)
    return Model(inp, x)  # 7,352 params — reused 3×

Full Chroma-Sense Model

CAI.ipynb
def build_chroma_sense(num_classes):
    fe = build_feature_extractor()        # shared weights

    rgb = Input(shape=(96, 96, 3))
    r = Lambda(lambda x: x[:,:,:,0:1])(rgb)
    g = Lambda(lambda x: x[:,:,:,1:2])(rgb)
    b = Lambda(lambda x: x[:,:,:,2:3])(rgb)

    rf, gf, bf = fe(r), fe(g), fe(b)       # sequential, same extractor
    x = Concatenate()([rf, gf, bf])          # (3, 3, 144)
    x = Conv2D(48, (1,1), activation='relu')(x)  # cross-channel attention
    x = Conv2D(16, (1,1), activation='relu')(x)
    x = GlobalMaxPooling2D()(x)
    out = Dense(num_classes, activation='softmax')(x)
    return Model(rgb, out)

Int8 Quantization (TFLite)

CAI.ipynb — export
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = converter.inference_output_type = tf.int8

def representative_dataset():
    for imgs, _ in val_dataset.take(100): yield [imgs]

converter.representative_dataset = representative_dataset
open('trained.tflite', 'wb').write(converter.convert())  # → 54 KB

On-Device Inference (OpenMV / MicroPython)

ei_image_classification.py → main.py on device
import sensor, tf, uos, gc, time

sensor.reset(); sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.QVGA); sensor.set_windowing((240, 240))
sensor.skip_frames(time=2000)  # auto-exposure settle

labels = open("/labels.txt").read().splitlines()
net    = tf.load("/trained", load_to_fb=uos.stat('/trained')[6] > gc.mem_free()-65536)

clock = time.clock()
while True:
    clock.tick()
    img  = sensor.snapshot()
    pred = sorted(zip(labels, net.classify(img)[0].output()), key=lambda x: -x[1])
    print(f"{pred[0][0]} ({pred[0][1]:.0%})  —  {clock.fps():.1f} FPS")
05

Training

Hyperparameters

Input size96 × 96 × 3
Batch size128
Epochs (initial)250
Learning rate0.0005
OptimizerAdam
LossCategorical Crossentropy
Fine-tune epochs10
Fine-tune LR0.000045
Layers unfrozenFinal 65%
HardwareNVIDIA Tesla T4

Dataset

  • Source: PlantVillage (Kaggle)
  • Plants: Apple, Tomato, Grape, Corn
  • Total: ~18,000 images, 18 classes
  • Split: 80% train / 20% validation
  • Resolution: 96×96×3 (cropped)
  • Preprocessing: Background removed — only disease semantics remain as consistent features
📊

Final (Apple): Training 98.08% · Validation 92.18% · Loss 0.2587

06

Results

F1 Scores by Plant & Disease (Int8)

🍎 Apple
Healthy
0.93
Black Rot
0.93
Black Scab
0.95
Cedar Rust
0.91
🍅 Tomato
Healthy
0.90
Mold
0.88
Curls
0.88
Blight
0.91
Mosaic
0.90
Septoria Spot
0.87
🍇 Grape
Healthy
0.97
Black Rot
0.95
Esca
0.96
Blight
0.96
🌽 Corn
Healthy
0.94
Rust
0.91
Blight
0.92
Leaf Spots
0.95

Apple Subset — Validation & Test

SplitFormatAccuracyROC-AUCWeighted F1Loss
ValidationInt892.18%0.99090.922
ValidationFloat3292.02%0.99200.920
Test (815 samples)Int891.78%0.98490.91770.5916
Test (815 samples)Float3291.53%0.98700.91530.3259
07

Edge Device Deployment

OpenMV H7
Heap (RAM)256 KB
Flash128 KB
CPUCortex-M7
8.4 FPS
Only model that fits ✓
OpenMV H7 Plus
Heap (RAM)4 MB
Flash32 MB
CPUCortex-M7
8.4 FPS
Highest FPS of all models ✓
Arduino Nicla Vision
Heap (RAM)256 KB
Flash16 MB
CPUM7 + M4 dual
7.2 FPS
Dual-core ✓

Flash Pre-built Firmware

Terminal
# Flash with Edge Impulse CLI
edge-impulse-flash-tool --firmware edge_impulse_firmware_openmv_cam_h7.bin

# Copy these to device flash:
#   trained.tflite  →  /trained
#   labels.txt      →  /labels.txt
#   ei_image_classification.py  →  main.py
# Reset device — classifies at boot, no setup needed
ℹ️

Edge Impulse Studio: Upload your dataset at edgeimpulse.com, configure an impulse with Image (96×96 RGB) → Chroma-Sense → Classification, train in-browser, and deploy with edge-impulse-run-impulse.

08

Comparison with Prior Work

All baselines were built at 3× the feature extractor width of Chroma-Sense to match their typical simultaneous R+G+B processing. Tested on the same dataset for 30 epochs with batch normalization. ❌ = model exceeded flash or heap; could not run on device.

ModelApple AccTomato AccRAMFlashParamsMADDsH7H7+Nicla
Conv2D95.288.8336 KB502 KB491K130M1.7
Grouped Conv86.471.7336 KB187 KB169K43M4.5
MobileNet86.786.6341 KB99 KB66K21M5.3
MobileNetV295.490.4271 KB203 KB153K25M7.0
MobileNetV392.892.6216 KB138 KB97K20M5.37.6
EfficientNetV291.888.6240 KB160 KB110K27M5.16.9
SqueezeNet97.283.5338 KB80 KB56K23M6.4
ShuffleNet90.283.0347 KB129 KB88K13M2.3
Squ. & Excitation82.279.1149 KB550 KB527K32M5.6
Multi-Ch. CNN88.886.1160 KB87 KB62K43M4.74.74.0
Chroma-Sense93.189.3160 KB54 KB15K8.3M8.48.47.2
09

Citation

@inproceedings{kethineni2025chromasense, title = {Chroma-Sense: A Memory-Efficient Plant Leaf Disease Classification Model For Edge Devices}, author = {Kethineni, Kiran K. and Wu, Samuel Y. and Mohanty, Saraju P. and Kougianos, Elias}, booktitle = {Proceedings of the IEEE Conference on Artificial Intelligence (CAI)}, year = {2025}, institution= {University of North Texas} }

Related: SprayCraft — Graph-Based Route Optimization for Variable Rate Precision Spraying  arXiv:2412.12176