Hardware Rich Machine Learning

Your hardware has capabilities you’re not fully using. The sensors, MCUs, and radios sitting in your product could do more – but getting ML to run reliably on constrained devices is a different game than training models on cloud GPUs.

You need someone who understands the entire pipeline: from sensor physics and firmware constraints, through communication trade-offs, all the way to dashboards and user-facing behavior. Someone who can keep battery life, latency, robustness, and security in view simultaneously – not just benchmark accuracy.

That’s what I bring: ML pipelines designed as a business optimization problem, built to survive real-world conditions like power limits, communication dropouts, and months of unattended operation.

If this is you:

You have hardware with untapped capabilities – sensors, MCUs, radios, Jetson boards – and feel you’re using only a fraction of what they can really do.
You need someone who can own the whole pipeline from sensor and MCU, through communication and ML models, all the way to dashboards and user-facing behavior.
You care about battery life, robustness, latency, and security – not just benchmark accuracy.
You want ML that is aligned with business goals and product constraints, not just state-of-the-art on paper.

Working with me, you get:

ML pipelines designed as a business optimization problem, not a pure research exercise.
A partner who understands hardware, firmware, communication, ML, and UX – and can keep all constraints visible at the same time.
Someone who can translate between your stakeholders: executives, domain experts, firmware engineers, data scientists, and designers.
Systems built to survive real life: power limits, communication dropouts, user behavior, and long-running unattended operation.

Core: Technical Skills

Signal processing – turning raw signals (biosignals, inertial data, etc) into meaningful, compact features.
Computer vision – designing and deploying CV systems on constrained and embedded hardware.
Machine learning and modeling – from classic ML to deep learning, designed to run on real devices, not only on large cloud GPUs.
Parallel and high-performance computing – making sure both preprocessing and models actually use the available compute. Here is a small tutorial on how to test multi-process code.
C and Python – low-level firmware and performance-critical code, plus high-level ML tools and glue.
R&D and experimentation – taking a research paper or “paper with code” and driving it to a tested, deployable system.
Project ownership – structuring work so models move from prototype into production instead of dying as demos.

A key capability: I can start from a research codebase and deliver a production deployment on constrained hardware, with explicit trade-offs and validation.

Core: Communication and People Skills

Stakeholder communication – I talk comfortably with engineers, researchers, clinicians, operations, sales, and executives, adjusting detail and language for each group. Your sales team will learn fast their buying persona and your management team can cherry pick their right delivery methodology.
Needs discovery and alignment – I extract what people really need and align technical objectives with those needs.
People management and mentoring – years of teaching and mentoring help me guide teams, support junior engineers, and keep collaboration healthy.
Cross-functional coordination – I keep firmware, data, ML, product, and business teams on the same page and reduce misalignment.

You don’t need one person to “do the ML,” another to “talk to the hardware team,” and a third to “explain it to management.” I reduce those hand-offs.

Bonus: Full-Stack Understanding

I understand the full stack:

Hardware and sensors – capabilities and limits of MCUs, radios, and sensors, including low-power wireless SoCs such as Nordic’s nRF family. Designed a PCB for low power low frequency application as a hobby.
Firmware and embedded software – on-device preprocessing, feature extraction, and decision logic.
Communication protocols – wired and wireless, payload formats, bandwidth and latency trade-offs.
Edge and cloud computing – deciding what to run where, and how to keep the system robust.
Front-end and web interfaces – presenting results and controls in a usable and understandable way.

This lets me place each ML component where it makes the most sense: on the sensor, on the MCU, on an Edge box, or in the cloud. The overall system behavior and business utility stay at the center.

Low-Compute Embedded ML

I have delivered three major embedded projects on low-power MCUs and Raspberry Pi, each with very different primary constraints.

Health Monitoring – Statistics Under Strict Communication and Battery Constraints

Body-related signals were collected on a low-power device. The business output was basically a statistic integrated over time, not instant reactions – acceptable response horizon was in the order of days. Battery life and communication costs were the dominant constraints. Radio usage was the main battery drain.

Solution and trade-offs: The pipeline was designed for very low communication volume and high tolerance for data dropouts and delayed uploads. This meant strong decimation and feature compression on the device, then upload of compact summaries instead of continuous streams. Caching at each level (device, edge, cloud) made the system tolerant to transient communication failures.

Health Monitoring – Real-Time ML on Bio-Signals

The system had to react to changes in the human body in near real-time, with system response under one second, targeting around 100ms latency where possible. Machine learning performance was critical – the target was very high detection quality, close to 99%. Power constraints were present but secondary to responsiveness and accuracy.

Solution and trade-offs: The optimization focused on full end-to-end reaction time, from body signal to decision. Any decision that could be made on-device or on an Edge node, without going to the cloud, improved reactivity. Logic was moved as close to the signal source as possible.

Home Automation – Unattended Operation Over the Public Internet

Based on a Raspberry Pi as the main controller. The key business constraint was unattended long-term operation over the internet: the system had to run for months/years without manual intervention. Engineer intervention more than once per year would be considered a bad outcome. Security and reliability were central.

Solution and trade-offs: Communication was designed with zero direct attack surface on the device – no open inbound ports, only I/O connections to trusted services. This approach was showcased by Back4app! Several layers of monitoring and watchdog mechanisms were implemented, including Raspberry Pi watchdogs, multiple UPS layers, and cloud data backup. This project has been in production since 2018 and is still live. The longest maintenance-free run reached 1130 days.

High-Compute Embedded AI: NVIDIA Jetson Orin

On the powerful side of embedded AI, I work with the NVIDIA Jetson Orin family of modules. These provide compact, power-efficient platforms with up to hundreds of TOPS of AI performance for edge workloads.

These modules are essentially small AI computers with Arm CPUs, integrated GPUs, and dedicated accelerators for AI and media. The software stack – JetPack, CUDA, TensorRT – supports advanced robotics and edge AI.

In real products, Jetson Orin boards still face constraints: power budgets in battery-powered or mobile systems, thermal limits and enclosure constraints, communication costs and bandwidth limits.

I know how to use dedicated IP cores and accelerators on Orin to run ML workloads efficiently instead of pushing everything to the CPU. I tailor model architectures and pipelines so they map well to this hardware and its power envelope. I bring the same systems view that I use for low-power devices, so the end result is not only fast but also deployable, robust, and matched to your business needs.