Musing: What’s the next $1bn+ “tools” market?

I was catching up with a friend of mine yesterday who’s looking to build a company in the wearables space and we started chatting about fast the wearables has been growing. The conversation stuck with me and as I left lunch I started wondering what the next big “tools” markets were going to be. The “tools” metaphor is referring to the observation that during a gold rush it’s usually more profitable to sell the pickaxes, wheelbarrows, and other supplies that miners need versus prospecting for gold yourself. Chris Dixon has an interesting post describing this phenomenom that’s worth a read, Selling pickaxes during a gold rush. Some recent examples would include the growth of collaborative open source development fostering GitHub and the shift to “cloud infrastructure” spawning PaaS companies like Heroku. Anyway, so what areas might end up creating $1bn+ tools markets?

IoT and Wearables

2014 might well go down as the year of The “Internet of things” (IoT), everyone is buzzing about it, everyone wants to leverage it, and everyone is a bit confused by it. The market is still immature but there’s already a flurry of competing standards and technologies. Looking just at connectivity, developers could potentially dealing with NFC, RFID, and Bluetooth LE. Given this early fragmentation and the wide range of potential applications, I think it’s a good bet that the IoT tools market will grow quickly over the course of the year. Locally, ThingWorx has already had a succesful exit and the Boston Business Journal is already throwing around nicknames.

On the consumer side, wearables is already a large market and its only projected to grow larger. Currently, the “activity tracker” space is fairly consolidated but that’ll certainly change as devices emerge to track different metrics through different technologies. The net result of this is that anyone looking to aggregate data from a heterogeneous set of devices will face an uphill battle. To combat this, we’ll definitely see tools emerging to help manage this complexity and create uniform interfaces. RunKeeper’s Health Graph is an early player here and they’ll certainly continue to innovate.

Cryptocurrencies

Bitcoin (and *coin) baby! Even though an $8bn market cap isn’t enough to buy WhatsApp, it’s certainly nothing to sneeze at. At this point, it’s still to early to declare that Bitcoin is “here to stay” but it’s definitely going to hang out for a bit. Given its immense disruptive potential and the archaic nature of existing financial infrastructure software, it’s almost a certainty that hugely successful “tool” companies will be built in the cryptocurrency space. From the “Bitcoin” version of payment processors like Braintree to electronic brokerage software like Interactive Brokers I think we’re going to see dozens of interesting cryptocurrency companies.

SaaS cloud wrangling

In the last few years, the number of SaaS products a typical company uses has grown exponentially. Nearly every function in a typical company has been influenced by SaaS products, from marketing and sales to accounting and strategy planning everyone’s data is now “in the cloud”. Despite the the availability of APIs, it’s become increasingly difficult to extract, manipulate, and analyze all the data stored within cloud services. Ad-hoc reports that used to involved combining a few Excel sheets now might require costly custom development. Tom Rikert of a16z has a great post describing the early stages of companies addressing this market and more will certainly follow suit. After the groundwork is laid, we’ll certainly see Google Now style “smart insights” to help companies discover new opportunities and insights.

Making predictions is always risky business and hopefully I don’t look back with these and facepalm. Anyway, would love to hear what anyone thinks in the comments.

Bitcoin: One vulnerability, two interesting questions

Over the last two weeks, there’s been two high profile negative Bitcoin incidents. First up, was Mt. Gox announcing that they were temporarily halting withdrawls and then soon after Silk Road 2.0 announcing that they been hacked and ~$2 million of BTC had been stolen. In both situations, the sites are blaming “transaction malleability”, what is supposedly a well known Bitcoin exploit, as the root cause of the issues. Predictably, most of the commentary surrounding both of these incidents has been that they’re both in fact cover ups for the site admins stealing the “lost” bitcoin. Regardless of what turns out to be true, both incidents are raising some interesting questions about bitcoin.

As I understand it, the “transaction malleability” vulnerability is an implementation specific issue that’s already been fixed in the “official” bitcoin client. This is directly contradictory to what Mt. Gox announced and one of the lead Bitcoin developers actually went as far as calling out Mt. Gox in Why Mt. Gox is full of shit. It isn’t clear if Mt. Gox is being intentionally dishonest, but this spat does raise an interesting issue of trusting the software that you’re using. Looking at the software we use on a daily basis, there’s a remarkable lack of transparency into how systems are built, if they’ve been audited, and if they’re composed of independently verifiable open source components. From the software that switches trains on tracks to the code that powers your cell phones, we generally don’t really know how the sausage was ultimately made. In general, things seem to work “OK” without consumers knowing these details but for people to be confident in Bitcoin payment systems they’ll ultimately demand transparency into the underlying implementations.

Another interesting point surfaced by this issue is the irreversibility of Bitcoin transactions. The Silk Road 2.0 announcement really highlights this, since they’re basically pleading with whoever stole the coins to “give them back”. It’s pretty clear that the inability to rollback transactions is going to make combating Bitcoin fraud a herculean task as the volume of transactions grows. Without a mechanism to “undo” a transaction, the majority of fraud prevention will have to rely on preventively blocking transactions as opposed to mediating them after the fact. There are certainly benefits to not being able to reverse transactions but Bitcoin will definitely need a strategy to combat issues like this.

Anyway, I’m still bullish on Bitcoin, the community has shown that it’s resilient and overall it’s definitely better to work out the kinks with $2 million instead of $200 million at stake. It looks like Mt. Gox is close to resuming normal activity and Silk Road 2.0 has recently announced that it’ll reimburse coins to everyone that was affected by the hack. Now if only the price would get back to $1000/coin…

Big Data: Amazon Redshift vs. Hive

In the last few months there’s been a handful blog posts basically themed “Redshift vs. Hive”. Companies from Airbnb to FlyData have been broadcasting their success in migrating from Hive to Redshift in both performance and cost. Unfortunately, a lot of casual observers have interpreted these posts to mean that Redshift is a “silver bullet” in the big data space. For some background, Hive is an abstraction layer that executes MapReduce jobs using Hadoop across data stored in HDFS. Amazon’s Redshift is a managed “petabyte scale” data warehouse solution that provides managed access to a ParAccel cluster and exposes a SQL interface that’s roughly similar to PostgreSQL. So where does that leave us?

From the outside, Hive and Redshift look oddly similar. They both promise “petabyte” scale, linear scalability, and expose an SQL’ish query syntax. On top of that, if you squint, they’re both available as Amazon AWS managed services through Elastic Mapreduce and of course Redshift. Unfortunately, that’s really where the similarities end which makes the “Hive vs. Redshift” comparisons along the lines of “apples to oranges”. Looking at Hive, its defining characteristic is that it runs across Hadoop and works on data stored in HDFS. Removing the acronym soup, that basically means that Hive runs MapReduce jobs across a bunch of text files that are stored in a distribued file system (HDFS). In comparison, Redshift uses a data model similar to PostgreSQL so data is structured in terms of rows and tables and includes the concept of indexes.

OK so who cares?

Well therein lays the rub that everyone seem to be missing. Hadoop, and by extension Hive (and Pig) are really good at processing text files. So imagine you have 10 million x 1mb XML documents or 100GB worth of nginx logs, this would be a perfect use case for Hive. All you would have to do is push them into HDFS or S3, write a RegEx to extract your data and then query away. Need to add another 2 million documents or 20GB of logs? No problem, just get them into HDFS and you’re good to go.

Could you do this with Redshift? Sure, but you’d need to pre-process 10 million XML documents and 100GB of logs to extract the appropriate fields, and then create CSV files or SQL INSERT statements to load into Redshift. Given the available options, you’re probably going to end up using Hadoop to do this anyway.

Where Redshift is really going to excel is in situations where your data is basically already relational and you have a clear path to actually get it into your cluster. For example, if you were running three x 15GB MySQL databases with unique, but related data, you’d be able to regularly pull that data into Redshift and then ad-hoc query it with regular SQL. In addition, since the data is already structured you’d be able to use the existing format to create keys in Redshift to improve performance.

Hammers, screws, etc

When it comes down it, it’ll come down to the old “right tool for the right job” aphorism. As an organization, you’ll have to evaluate how your data is structured, the types of queries you’re interested in running, and what level of abstraction you’re comfortable with. What’s definitely true is that “enterprise” data warehousing is being commoditized and the “old guard” better innovate or die.

Fun: What does a “better” rental real estate brokerage look like?

Note: I have zero real estate experience beyond renting apartments in Boston/Cambridge so obviously this is all just hearsay.

I was grabbing drinks with a buddy of mine earlier and we started chatting about “brick and mortar” businesses that for whatever reason weren’t being disrupted by technology. As we were throwing out ideas, one of the business that really captured both of us was rental real estate brokerages. Specifically, we were talking about those typically scummy brokerages that constantly post on Craigslist, show you a few apartments, and then follow through by putting you through a painful experience to actually rent the place. I’m admittedly no expert, but out of the four apartments I’ve rented every experience has been terrible to a varying degree.

What makes them so bad?

The entire process of finding an apartment is pretty terrible but ultimately most of the frustrations boil down to dealing with brokers being lazy or incompetent, inaccurate or incomplete data, and then the absurdity of having to drop of paper forms…in 2014. Venturing into specifics gripes wouldn’t be useful since they’re anecdotal but my general sense is the majority of Boston/Cambridge renters aren’t thrilled with their broker experiences.

A playbook for a better brokerage

At a high level, being successful at this will be driven by building a company culture of excellence and customer service. You’ll have to take Tony Hsieh’s playbook from Zappos, adapt it to running a brokerage, and then feriously build a culture to support it. Concretely, that’ll translate to hiring individuals with high emotional intelligence, trusting them to make decisions, and then buying or building the right tools to make it happen. Ok great, we’re knocking off a famous management philosophy and hiring “awesome people” how are we actually running this thing?

Don’t just pay on commission: This is entirely 2nd hand but my understanding is that most of the brokerages in Cambridge/Boston pay agents entirely on commission. It seems like the net result of this is that agents spend a lot of time chasing crappy deals, and have no incentives to actively help the brokerage. We’re going to pay an hourly rate along with a lower commission based on a combination of factors beyond just the number of deals closed.

Pick a price tier and own it: At all the brokerages I’ve interacted with, they were trying to move apartments throughout the entire pricing spectrum. From $800/mon studios in sketchy neighborhoods to premium 2 bedrooms at $3200/mon in desirable locations. From the brokerage’s point of view it makes perfect sense, since they’re paying on commission they really don’t care if their agents burn hours on low margin apartments – a rental is still money in their pockets. We’re doing it differently, pick a price range and own it. Intuitively, it seems like the best range to focus on would be moderately high priced multi-bedroom apartments in order to optimize both demand and fees captured.

Qualified lead gen: As an outsider looking in, a significant challenge for the strategy we’re outlining is going to be how do you keep a pipeline of qualified leads? Instead of waiting for people to “drop in”, we’re going to be pro-active and be identifying, meeting, and connecting with potential renters before they’re actively renting. From attending startup events to sponsoring events for graduating seniors, we’ll be top of mind for potential renters that certainly will have a future need.

Social and email: None of the brokerages I’ve used ever asked for my email address, guess how many got repeat business? It’s 2014, social and email are critically important channels for winning customers, driving referrals, and building a brand. We’ll start small, with Twitter and Facebook to connect with potential leads and then leverage email to send follow up emails, ask for potential referrals, and then hopefully win repeat business. After that, start experimenting with Faceook ads and display ads.

High quality photos and accurate data: Photos matter, a lot. We’re going to source our own, high quality photos of every apartment that we show. After a year or two, we’ll end up with the best sets of photos for some of the most expensive apartments in the city. On top of that, we’ll be gathering clean, structured data about all of the apartments we’re showing and renting. With this data, our listings will be the most attractive and we’ll also be able to place clients using only our own internal datasets.

Make the paperwork not suck: We’re going to end the frustration of dealing with paper forms. Renters will be able to pay their deposit with a credit card (+2.5% fee) online, fill out the MA renters agreement online, and we’ll actually have them credit checked before they get this far. Close faster, less deals fall apart, and everything is digital. I know companies like RocketLease are already playing in this space and they’d be a perfect partner.

Access better inventory: Unfortunately, this is an exercise for the reader. Beyond hooking into the public MLS feed and tapping into syndicate services like You Got Listings I’m not familiar enough with the real estate market to speak to how to get better listings. Would love to hear any ideas in the comments though!

Anyway, there’s obviously more to running a successful brokerage but looking at my experiences renting and techniques that have worked in other industries I think it would be possible to build a customer focused, technology powered brokerage that was extremely competitive.

Boston Tech Startup Spotlight: Recorded Future

Boston is one of the most active places in the US for technology innovation and home to hundreds of exciting young companies with incredible new ideas. In support of the Boston tech startup scene, I have been publishing a series of short blog posts spotlighting some of our most interesting neighbors.

Due to our continued fascination with big data and support for companies playing in the space it seemed only logical to write about Recorded Future for this edition.  These guys are also headquartered in Cambridge, with offices in Göteborg, Sweden and Arlington, VA.

They constantly collect real-time data from web sources such as news, blogs, and public social media and use their technology to analyze trends and identify past, present, and future events. These events are then linked to the people, places, and organizations that matter to their clients, who include Fortune 500 companies and leading government agencies.

Recorded Future’s team of computer scientists, statisticians, linguists, and technical business people offer up an array of software products and services centered around web intelligence. They also provide the Recorded Future API, a web service that allows developers to get in on the action by accessing Recorded Future’s index for large scale analysis of online media flow.

If you’re interested, there’s lots more about their products and services on their website.

Stay tuned for the next startup spotlight.