Gathering Structured Data From Phone Calls

A lot of information these days is just a Google search away, but there is still a surprising number of businesses out there that keep information like pricing locked behind phone lines. Oftentimes, this is deliberate, and they may do this for a variety of reasons:

  • Fluctuating prices that change based on demand, inventory, or seasonality.
  • Sales psychology that converts curious callers into customers.
  • A competitive advantage in keeping pricing opaque to competitors.
  • Personalized quotes that change based on customer need.
  • Old school businesses that just never went digital.

Traditionally, to gather information from these businesses, you would need someone or even multiple people to work through an endless call list, navigating phone menu trees, waiting on hold, and manually transcribing conversations into spreadsheets. This is tedious, expensive, slow, and doesn’t scale.

At Setfive, we decided to look into how we could automate this.

OpenAI Realtime API

The timing couldn’t have been better. As we were exploring ways to do this, OpenAI released its Realtime API, a game-changer for voice-based AI applications. Unlike conventional text-based APIs that require separate speed-to-text and text-to-speech steps, the Realtime API combines these and enables:

  • Low-latency native voice conversations.
  • Natural interruptions for more human-like interactions.
  • Built-in function calling for triggering actions mid-conversation.

This was an AI capable of having an actual over-the-phone conversation.

Building The Bridge

With the brain of the operation sorted, it was time to find a way to actually make phone calls. For this, we chose Twilio, a well-regarded platform for telecommunications for almost two decades.

Twilio’s Media Streams API made it simple to pipe audio directly to and from the OpenAI Realtime API, creating a seamless conversation flow. The business on the other end hears a responsive customer who can handle unexpected conversational turns.

Navigating The Maze

One of the first challenges we ran into? Phone trees. You know them: “Press 1 for appointments, Press 2 to speak to a customer service representative, …” These interactive voice response (IVR) systems are designed for touch-tone input, not voice commands.

We solved this by building AI tools that can simulate DTMF (Dual-Tone Multi-Frequency) signals using Twilio’s API – which required some trial and error with their callback and TwiML architecture – so that our AI can listen to menu options, simulate button presses, navigate complex multi-level menus, and find the fastest path to reach a customer service representative or a front desk.

From Conversations To Structured Data

Getting through to the right person is only half the battle. The real magic happens when our AI finally gets into a conversation. From there, we are able to extract structured information from free-flowing conversations in real time. Using carefully crafted prompts, our system can:

  • Identify key information even when it’s mentioned casually
  • Ask clarifying questions about discrepancies in the information received
  • Extract additional valuable data like availability, pricing details (first-time customer, minimum orders, ect.), and more
  • Create clean, structured data ready for your database, Excel spreadsheet, or whatever else you’re using.

When Nobody Answers

Here’s something we didn’t anticipate: businesses that rely heavily on phone communication are often too busy to answer their phones. These are often small businesses that may not have dedicated staff for handling phones or may have employees who wear multiple hats. They’re not sitting by the phone waiting for calls.

This was having a real effect on our success rate, and we didn’t want to make multiple calls to the same business, hoping for someone to be available. The next step was obvious: voicemail. We enhanced our system to handle a full communication cycle:

  • Intelligent voicemail detection to detect when we have reached a voicemail inbox.
  • Leave a natural message requesting whatever information the AI is looking for.
  • Callback handling that is able to naturally continue the conversation when a business calls back.

Ready to Build?

Interested in how this can help you? Email us at contact@setfive.com to find out more or check out
our demo at voice2data.setfive.com!