Baby Monitor

I’ve been struck by how many baby monitors have died over the span of our children. The things just seem to last about a year or two, and they cost a lot! So I’ve finally decided to make my own.

Note: Please read my Results section at the end before trying to duplicate my steps…

Hardware

I’m using a pair of ESP32 boards here, because they’re popular, common, and cheap. They have ESP-Now which allows you to communicate between them without Wi-Fi, an I2S audio input, and a DAC for audio output. You should be able to find them on ebay for under $10 each.

For the microphone, I’m using an INMP441. It’s a 24-bit I2S non-PDM microphone which is, you guessed it, cheap. You should be able to find them for under $5 on ebay.

The speaker I used is an 8Ω one I had lying around in a learn electronics kit. It’s very quiet. If you were doing this project for real, you should probably find a small amplifier board instead.

Microphone wiring

To connect the microphone to one of the ESP32s, I recommend reading about I2S on Wikipedia first. Here’s the connections:

L/R (left/right) -> This goes to ground. According to the INMP441 datasheet this makes it the left channel.
WS (word select, aka LRCLK) -> This goes to pin D15 on the ESP32.
SCK (serial clock, aka BCLK/bit clock) -> D14.
SD (serial data, aka DOUT) -> D34.
VDD -> To +3.3V.
GND -> Ground.

Speaker wiring

To connect the speaker (or line-out) to the other ESP32:

One wire to ground.
Another wire to D25 on the ESP32.

Arduino

I’m using Arduino to program the ESP32s, because it’s simpler than Espressif’s toolchain. Install Arduino, then follow these instructions to add ESP32 support.

You’ll probably need to install drivers to suit whatever USB-UART bridge chip is on your boards. There should be two chips near the USB connector: one is power (not many pins), and one is the UART (lots of pins). Read the chip number, find the manufacturer’s page, and find and install the drivers. In my case, it was a Silabs CP2102.

The Arduino settings that work for me are as follows, perhaps they will work for you:

Tools > Board > ESP32 Wrover module
Tools > Port > /dev/cu.SLAB_USBtoUART
Tools > Programmer > ArduinoISP

I find that I need to hold down the ‘boot’ button on my boards while programming them.

ESP-Now setup

You’ll need to find the MAC of the receiving ESP32. Here’s some Arduino code to do this:

#include "WiFi.h"

void setup() {
    Serial.begin(115200);
    WiFi.mode(WIFI_MODE_STA);
    Serial.println(WiFi.macAddress());
}

void loop() {}

Transmitter

Here’s the code for the ESP32 which listens to the mic via the I2S input, converts to 8-bits, and transmits over ESP-Now:

#include "WiFi.h"
#include "esp_now.h"
#include "driver/i2s.h"

uint8_t receiverMAC[] = {0x11, 0x22, 0x33, 0x44, 0x55, 0x66}; <- Replace this with the MAC of your other board!

void setup() {
    Serial.begin(115200);
    
    WiFi.mode(WIFI_MODE_STA); // Wifi (prerequisite for ESP-Now).

    // Setup ESP-Now first, because I2S uses it.
    Serial.println("Setup ESP-Now...");
    if (ESP_OK != esp_now_init()) {
        Serial.println("esp_now_init: error");
        return;
    }
    esp_now_peer_info_t peerInfo = {0};
    memcpy(peerInfo.peer_addr, receiverMAC, sizeof(receiverMAC));
    // TODO encrypt, by setting peerInfo.lmk.
    if (ESP_OK != esp_now_add_peer(&peerInfo)) {
        Serial.println("esp_now_add_peer: error");
        return;
    }

    // I2S.
    Serial.println("Setup I2S...");
    i2s_config_t i2s_config = {
        .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
        .sample_rate = 11025,
        .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT, // INMP441 is 24 bits, but it doesn't work if we set 24 bit here.
        .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
        .communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB),
        .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
        .dma_buf_count = 4,
        .dma_buf_len = ESP_NOW_MAX_DATA_LEN * 4, // * 4 for 32 bit.
        .use_apll = false,
        .tx_desc_auto_clear = false,
        .fixed_mclk = 0,
    };
    if (ESP_OK != i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL)) {
        Serial.println("i2s_driver_install: error");
    }
    i2s_pin_config_t pin_config = {
        .bck_io_num = 14,   // Bit Clock.
        .ws_io_num = 15,    // Word Select.
        .data_out_num = -1,
        .data_in_num = 34,  // Data-out of the mic.
    };
    if (ESP_OK != i2s_set_pin(I2S_NUM_0, &pin_config)) {
        Serial.println("i2s_set_pin: error");
    }
    i2s_zero_dma_buffer(I2S_NUM_0);

    Serial.println("Setup done.");
}

// This is used to scale the audio when things get loud, and gradually increase sensitivity when things go quiet.
#define RESTING_SCALE 127
int32_t scale = RESTING_SCALE;

void loop() {
    // Read from the DAC. This comes in as signed data with an extra byte.
    size_t bytesRead = 0;
    uint8_t buffer32[ESP_NOW_MAX_DATA_LEN * 4] = {0};
    i2s_read(I2S_NUM_0, &buffer32, sizeof(buffer32), &bytesRead, 1000);
    int samplesRead = bytesRead / 4;

    // Convert to 16-bit signed.
    // It's actually 24-bit, but the lowest byte is just noise, even in a quiet room.
    // If we go to 16 bit we don't have to worry about extending a sign byte.
    // Quiet room seems to be values maxing around 7.
    // Max seems around 300 with me at 0.5m distance talking at normal loudness.
    int16_t buffer16[ESP_NOW_MAX_DATA_LEN] = {0};
    for (int i=0; i<samplesRead; i++) {
        // Offset + 0 is always E0 or 00, regardless of the sign of the other bytes,
        // because our mic is only 24-bits, so discard it.
        // Offset + 1 is the LSB of the sample, but is just fuzz, discard it.
        uint8_t mid = buffer32[i * 4 + 2];
        uint8_t msb = buffer32[i * 4 + 3];
        uint16_t raw = (((uint32_t)msb) << 8) + ((uint32_t)mid);
        memcpy(&buffer16[i], &raw, sizeof(raw)); // Copy so sign bits aren't interfered.
    }

    // Find the maximum scale.
    int16_t max = 0;
    for (int i=0; i<samplesRead; i++) {
        int16_t val = buffer16[i];
        if (val < 0) { val = -val; }
        if (val > max) { max = val; }
    }

    // Push up the scale if volume went up.
    if (max > scale) { scale = max; }
    // Gradually drop the scale when things are quiet.
    if (max < scale && scale > RESTING_SCALE) { scale -= 300; }
    if (scale < RESTING_SCALE) { scale = RESTING_SCALE; } // Dropped too far.

    // Scale it to int8s so we aren't transmitting too much data.
    int8_t buffer8[ESP_NOW_MAX_DATA_LEN] = {0};
    for (int i=0; i<samplesRead; i++) {
        int32_t scaled = ((int32_t)buffer16[i]) * 127 / scale;
        if (scaled <= -127) {
            buffer8[i] = -127;
        } else if (scaled >= 127) {
            buffer8[i] = 127;
        } else {
            buffer8[i] = scaled;
        }
    }

    // Send to the other ESP32.
    if (ESP_OK != esp_now_send(NULL, (uint8_t *)buffer8, samplesRead)) {
        Serial.println("Error: esp_now_send");
        delay(500);
    }
}

Receiver

#include "WiFi.h"
#include "esp_now.h"
#include "driver/i2s.h"

// Called when ESP-Now receives.
void onDataRecv(const uint8_t *mac, const uint8_t *incomingRaw, int samples) {
    // Convert it from 8 bit signed to 16 bit unsigned with an 0x80 delta which is what the DAC requires.
    int8_t *incoming8 = (int8_t *)incomingRaw;
    uint16_t incoming16[ESP_NOW_MAX_DATA_LEN] = {0};
    for (int i=0; i<samples; i++) {
        int32_t value = incoming8[i];
        value += 0x80; // DAC wants unsigned values with a bias, not signed!
        incoming16[i] = value << 8;
    }

    // Forward it to the DAC.
    size_t bytesWritten=0;
    i2s_write(I2S_NUM_0, incoming16, samples * 2, &bytesWritten, 500);
}

void setup() {
    Serial.begin(115200);

    // Setup I2S first, because the ESP-Now listener uses it.
    i2s_config_t i2s_config = {
        .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX | I2S_MODE_DAC_BUILT_IN),
        .sample_rate = 11025
        .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
        .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
        .communication_format = I2S_COMM_FORMAT_I2S_MSB,
        .intr_alloc_flags = 0,
        .dma_buf_count = 4,
        .dma_buf_len = ESP_NOW_MAX_DATA_LEN * 2,
        .use_apll = false
    };
    i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
    i2s_zero_dma_buffer(I2S_NUM_0);
    i2s_set_pin(I2S_NUM_0, NULL);

    // ESP-Now.
    Serial.println("ESP-Now setup...");
    WiFi.mode(WIFI_STA);
    if (esp_now_init() != ESP_OK) {
        Serial.println("Setup > ESP-Now error");
        return;
    }
    esp_now_register_recv_cb(onDataRecv);

    Serial.println("Setup complete.");
}

void loop() {}

Results

The sound is very distorted, and I could never discover why.

I came to the conclusion that the ESP32 (hardware / software) is just too buggy, and not documented well enough, to be taken seriously.

Etc

Things you might want to consider, because this project is far from perfect:

Using an ESP-Now encryption key.
Making the ‘sender’ not transmit a packet if the sound level is quiet, which will likely extend the life of the radio circuitry.
Making the ‘receiver’ handle missed packets gracefully and treat them as quiet.
Some kind of amplifier, or I2S sound output from the receiver, for better sound quality.
Mounting it nicely!

Thanks for reading, I hope this helps someone, and have a great week!

Legals: I take no responsibility; give no guarantee/warranty for this project.

Thanks for reading! And if you want to get in touch, I'd love to hear from you: chris.hulbert at gmail.

Chris Hulbert

(Comp Sci, Hons - UTS)

Software Developer (Freelancer / Contractor) in Australia.

I have worked at places such as Google, Cochlear, Assembly Payments, News Corp, Fox Sports, NineMSN, FetchTV, Coles, Woolworths, Trust Bank, and Westpac, among others. If you're looking for help developing an iOS app, drop me a line!

Get in touch:
[email protected]
github.com/chrishulbert
linkedin

Subscribe via RSS

Making a baby monitor out of a couple of ESP32s, an I2S microphone, and a small speaker