Chroma-Sense · IEEE CAI 2025

Overview

Plant disease classification CNNs are typically hosted on cloud servers — requiring internet connectivity, introducing latency, and exposing sensitive crop data. Edge computing moves inference to the device itself, but microcontrollers have severe memory constraints that most CNN architectures cannot satisfy.

Chroma-Sense is a novel architecture that classifies plant leaf diseases directly on the camera, running at real-time speeds on devices with as little as 256 KB of heap memory and 128 KB of flash — without any cloud connection.

⚠ The Problem

Cloud CNNs require internet — unreliable in rural fields
Network latency prevents real-time response
Security risk: proprietary crop data leaves the farm
Mobile architectures (MobileNet, EfficientNet) still too heavy for smallest devices

✓ Chroma-Sense Solution

Runs entirely on-device — no internet needed
8.4 FPS on OpenMV H7 (256 KB heap, 128 KB flash)
Only 15K parameters, 54 KB quantized model
Deploys on drones, cameras, IoT nodes in the field

Core Insight

Plant disease symptoms — spots, rings, patches, mosaics, stripes — are semantically simple. More importantly, the same types of shapes appear in each individual RGB channel, just with different brightness distributions.

Per-Channel Anatomy of Apple Cedar Rust

🍎

RGB (full)

Orange/yellow spot, red ring, black center — complex combined pattern

Red channel

Bright center (yellow = R+G) and bright red ring both visible

Green channel

Only the yellow center is bright — red ring is absent

Blue channel

Only the black center spot is visible — everything else is dark

Each channel contains a simple spot or ring — not the complex multi-color concentric pattern of the full image. A small CNN can learn to detect these simple shapes with very few feature maps. And since the same shape types appear in every channel, the same feature extractor works for all three.

Serial Multi-Channel Processing

RGB Input96×96×3

→

SplitR / G / B

→

Feature Extractorshared weights

→

Concatenate3×3×144

→

1×1 Convcross-channel fusion

→

GlobalMaxPool→ Softmax

Sharing the extractor slashes parameter count (no 3× duplication). Processing sequentially means only one channel's activations ever live in RAM — cutting peak memory by ~2/3 vs. parallel multi-channel approaches.

💡

Why GlobalMaxPooling? Diseases appear randomly — a leaf may have 2 spots or 20. Average pooling produces a score of 0.2 for a sparse leaf vs. 1.0 for a dense one, causing misclassifications. Max pooling always returns 1.0 if any spot matches the filter, regardless of count. Critical after Int8 quantization, which amplifies this variance.

🧠

Memory Efficiency

Sequential processing cuts CNN width by ~2/3, reducing peak RAM 25% vs. MobileNetV3.

⚡

Fewer Parameters

Shared extractor across all channels: 15K params vs. 97K in the closest competitor.

🔍

Better Features

Grayscale-per-channel training forces shape/texture learning over color shortcuts.

🗺

Explainability

Channel-specific feature maps are interpretable — Grad-CAM highlights disease zones clearly.

Architecture

Feature Extractor — shared across R, G, B

InputLayer(96, 96, 1)—

Conv2D 3×3(96, 96, 8)80

MaxPooling 2×2(48, 48, 8)—

SeparableConv2D(48, 48, 24)228

MaxPooling 2×2(24, 24, 24)—

SeparableConv2D(24, 24, 48)1,416

MaxPooling 2×2(12, 12, 48)—

SeparableConv2D(12, 12, 48)2,746

MaxPooling 2×2(6, 6, 48)—

SeparableConv2D(6, 6, 48)2,746

MaxPooling 2×2(3, 3, 48)—

Total params7,352

Classification Head

Input(96, 96, 3)—

Lambda ×3(96, 96, 1) each—

Extractor ×3(3, 3, 48) eachshared

Concatenate(3, 3, 144)—

Conv2D 1×1(3, 3, 48)~6,960

Conv2D 1×1(3, 3, 16)~784

GlobalMaxPool2D(16,)—

Dense + Softmax(N classes,)~68

Total params (Apple)~15,164

The SeparableConv2D layers follow MobileNet's depthwise-separable factorization — separating spatial filtering from channel mixing — cutting compute and parameters significantly. At just 32–48 feature maps per layer, the linear bottleneck / inverted residual blocks used by MobileNetV2 offer no benefit and were omitted.

Code Guide

Feature Extractor

CAI.ipynb

def build_feature_extractor(input_shape=(96, 96, 1)):
    inp = Input(shape=input_shape)
    x = Conv2D(8, (3,3), padding='same', activation='relu')(inp)
    x = MaxPooling2D((2,2))(x)
    for filters in [24, 48, 48, 48]:
        x = SeparableConv2D(filters, (3,3), padding='same', activation='relu')(x)
        x = MaxPooling2D((2,2))(x)
    return Model(inp, x)  # 7,352 params — reused 3×

Full Chroma-Sense Model

CAI.ipynb

def build_chroma_sense(num_classes):
    fe = build_feature_extractor()        # shared weights

    rgb = Input(shape=(96, 96, 3))
    r = Lambda(lambda x: x[:,:,:,0:1])(rgb)
    g = Lambda(lambda x: x[:,:,:,1:2])(rgb)
    b = Lambda(lambda x: x[:,:,:,2:3])(rgb)

    rf, gf, bf = fe(r), fe(g), fe(b)       # sequential, same extractor
    x = Concatenate()([rf, gf, bf])          # (3, 3, 144)
    x = Conv2D(48, (1,1), activation='relu')(x)  # cross-channel attention
    x = Conv2D(16, (1,1), activation='relu')(x)
    x = GlobalMaxPooling2D()(x)
    out = Dense(num_classes, activation='softmax')(x)
    return Model(rgb, out)

Int8 Quantization (TFLite)

CAI.ipynb — export

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = converter.inference_output_type = tf.int8

def representative_dataset():
    for imgs, _ in val_dataset.take(100): yield [imgs]

converter.representative_dataset = representative_dataset
open('trained.tflite', 'wb').write(converter.convert())  # → 54 KB

On-Device Inference (OpenMV / MicroPython)

ei_image_classification.py → main.py on device

import sensor, tf, uos, gc, time

sensor.reset(); sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.QVGA); sensor.set_windowing((240, 240))
sensor.skip_frames(time=2000)  # auto-exposure settle

labels = open("/labels.txt").read().splitlines()
net    = tf.load("/trained", load_to_fb=uos.stat('/trained')[6] > gc.mem_free()-65536)

clock = time.clock()
while True:
    clock.tick()
    img  = sensor.snapshot()
    pred = sorted(zip(labels, net.classify(img)[0].output()), key=lambda x: -x[1])
    print(f"{pred[0][0]} ({pred[0][1]:.0%})  —  {clock.fps():.1f} FPS")

Training

Hyperparameters

Input size	96 × 96 × 3
Batch size	128
Epochs (initial)	250
Learning rate	0.0005
Optimizer	Adam
Loss	Categorical Crossentropy
Fine-tune epochs	10
Fine-tune LR	0.000045
Layers unfrozen	Final 65%
Hardware	NVIDIA Tesla T4

Dataset

Source: PlantVillage (Kaggle)
Plants: Apple, Tomato, Grape, Corn
Total: ~18,000 images, 18 classes
Split: 80% train / 20% validation
Resolution: 96×96×3 (cropped)
Preprocessing: Background removed — only disease semantics remain as consistent features

📊

Final (Apple): Training 98.08% · Validation 92.18% · Loss 0.2587

Results

F1 Scores by Plant & Disease (Int8)

🍎 Apple

Healthy

0.93

Black Rot

0.93

Black Scab

0.95

Cedar Rust

0.91

🍅 Tomato

Healthy

0.90

Mold

0.88

Curls

0.88

Blight

0.91

Mosaic

0.90

Septoria Spot

0.87

🍇 Grape

Healthy

0.97

Black Rot

0.95

Esca

0.96

Blight

0.96

🌽 Corn

Healthy

0.94

Rust

0.91

Blight

0.92

Leaf Spots

0.95

Apple Subset — Validation & Test

Split	Format	Accuracy	ROC-AUC	Weighted F1	Loss
Validation	Int8	92.18%	0.9909	0.922	—
Validation	Float32	92.02%	0.9920	0.920	—
Test (815 samples)	Int8	91.78%	0.9849	0.9177	0.5916
Test (815 samples)	Float32	91.53%	0.9870	0.9153	0.3259

Edge Device Deployment

OpenMV H7

Heap (RAM)256 KB

Flash128 KB

CPUCortex-M7

8.4 FPS

Only model that fits ✓

OpenMV H7 Plus

Heap (RAM)4 MB

Flash32 MB

CPUCortex-M7

8.4 FPS

Highest FPS of all models ✓

Arduino Nicla Vision

Heap (RAM)256 KB

Flash16 MB

CPUM7 + M4 dual

7.2 FPS

Dual-core ✓

Flash Pre-built Firmware

Terminal

# Flash with Edge Impulse CLI
edge-impulse-flash-tool --firmware edge_impulse_firmware_openmv_cam_h7.bin

# Copy these to device flash:
#   trained.tflite  →  /trained
#   labels.txt      →  /labels.txt
#   ei_image_classification.py  →  main.py
# Reset device — classifies at boot, no setup needed

ℹ️

Edge Impulse Studio: Upload your dataset at edgeimpulse.com, configure an impulse with Image (96×96 RGB) → Chroma-Sense → Classification, train in-browser, and deploy with edge-impulse-run-impulse.

Comparison with Prior Work

All baselines were built at 3× the feature extractor width of Chroma-Sense to match their typical simultaneous R+G+B processing. Tested on the same dataset for 30 epochs with batch normalization. ❌ = model exceeded flash or heap; could not run on device.

Model	Apple Acc	Tomato Acc	RAM	Flash	Params	MADDs	H7	H7+	Nicla
Conv2D	95.2	88.8	336 KB	502 KB	491K	130M	❌	1.7	❌
Grouped Conv	86.4	71.7	336 KB	187 KB	169K	43M	❌	4.5	❌
MobileNet	86.7	86.6	341 KB	99 KB	66K	21M	❌	5.3	❌
MobileNetV2	95.4	90.4	271 KB	203 KB	153K	25M	❌	7.0	❌
MobileNetV3	92.8	92.6	216 KB	138 KB	97K	20M	❌	5.3	7.6
EfficientNetV2	91.8	88.6	240 KB	160 KB	110K	27M	❌	5.1	6.9
SqueezeNet	97.2	83.5	338 KB	80 KB	56K	23M	❌	6.4	❌
ShuffleNet	90.2	83.0	347 KB	129 KB	88K	13M	❌	2.3	❌
Squ. & Excitation	82.2	79.1	149 KB	550 KB	527K	32M	❌	5.6	❌
Multi-Ch. CNN	88.8	86.1	160 KB	87 KB	62K	43M	4.7	4.7	4.0
Chroma-Sense	93.1	89.3	160 KB	54 KB	15K	8.3M	8.4	8.4	7.2

Citation

@inproceedings{kethineni2025chromasense, title = {Chroma-Sense: A Memory-Efficient Plant Leaf Disease Classification Model For Edge Devices}, author = {Kethineni, Kiran K. and Wu, Samuel Y. and Mohanty, Saraju P. and Kougianos, Elias}, booktitle = {Proceedings of the IEEE Conference on Artificial Intelligence (CAI)}, year = {2025}, institution= {University of North Texas} }

Related: SprayCraft — Graph-Based Route Optimization for Variable Rate Precision Spraying arXiv:2412.12176