Bitcoin: One vulnerability, two interesting questions

Over the last two weeks, there’s been two high profile negative Bitcoin incidents. First up, was Mt. Gox announcing that they were temporarily halting withdrawls and then soon after Silk Road 2.0 announcing that they been hacked and ~$2 million of BTC had been stolen. In both situations, the sites are blaming “transaction malleability”, what is supposedly a well known Bitcoin exploit, as the root cause of the issues. Predictably, most of the commentary surrounding both of these incidents has been that they’re both in fact cover ups for the site admins stealing the “lost” bitcoin. Regardless of what turns out to be true, both incidents are raising some interesting questions about bitcoin.

As I understand it, the “transaction malleability” vulnerability is an implementation specific issue that’s already been fixed in the “official” bitcoin client. This is directly contradictory to what Mt. Gox announced and one of the lead Bitcoin developers actually went as far as calling out Mt. Gox in Why Mt. Gox is full of shit. It isn’t clear if Mt. Gox is being intentionally dishonest, but this spat does raise an interesting issue of trusting the software that you’re using. Looking at the software we use on a daily basis, there’s a remarkable lack of transparency into how systems are built, if they’ve been audited, and if they’re composed of independently verifiable open source components. From the software that switches trains on tracks to the code that powers your cell phones, we generally don’t really know how the sausage was ultimately made. In general, things seem to work “OK” without consumers knowing these details but for people to be confident in Bitcoin payment systems they’ll ultimately demand transparency into the underlying implementations.

Another interesting point surfaced by this issue is the irreversibility of Bitcoin transactions. The Silk Road 2.0 announcement really highlights this, since they’re basically pleading with whoever stole the coins to “give them back”. It’s pretty clear that the inability to rollback transactions is going to make combating Bitcoin fraud a herculean task as the volume of transactions grows. Without a mechanism to “undo” a transaction, the majority of fraud prevention will have to rely on preventively blocking transactions as opposed to mediating them after the fact. There are certainly benefits to not being able to reverse transactions but Bitcoin will definitely need a strategy to combat issues like this.

Anyway, I’m still bullish on Bitcoin, the community has shown that it’s resilient and overall it’s definitely better to work out the kinks with $2 million instead of $200 million at stake. It looks like Mt. Gox is close to resuming normal activity and Silk Road 2.0 has recently announced that it’ll reimburse coins to everyone that was affected by the hack. Now if only the price would get back to $1000/coin…

Big Data: Amazon Redshift vs. Hive

In the last few months there’s been a handful blog posts basically themed “Redshift vs. Hive”. Companies from Airbnb to FlyData have been broadcasting their success in migrating from Hive to Redshift in both performance and cost. Unfortunately, a lot of casual observers have interpreted these posts to mean that Redshift is a “silver bullet” in the big data space. For some background, Hive is an abstraction layer that executes MapReduce jobs using Hadoop across data stored in HDFS. Amazon’s Redshift is a managed “petabyte scale” data warehouse solution that provides managed access to a ParAccel cluster and exposes a SQL interface that’s roughly similar to PostgreSQL. So where does that leave us?

From the outside, Hive and Redshift look oddly similar. They both promise “petabyte” scale, linear scalability, and expose an SQL’ish query syntax. On top of that, if you squint, they’re both available as Amazon AWS managed services through Elastic Mapreduce and of course Redshift. Unfortunately, that’s really where the similarities end which makes the “Hive vs. Redshift” comparisons along the lines of “apples to oranges”. Looking at Hive, its defining characteristic is that it runs across Hadoop and works on data stored in HDFS. Removing the acronym soup, that basically means that Hive runs MapReduce jobs across a bunch of text files that are stored in a distribued file system (HDFS). In comparison, Redshift uses a data model similar to PostgreSQL so data is structured in terms of rows and tables and includes the concept of indexes.

OK so who cares?

Well therein lays the rub that everyone seem to be missing. Hadoop, and by extension Hive (and Pig) are really good at processing text files. So imagine you have 10 million x 1mb XML documents or 100GB worth of nginx logs, this would be a perfect use case for Hive. All you would have to do is push them into HDFS or S3, write a RegEx to extract your data and then query away. Need to add another 2 million documents or 20GB of logs? No problem, just get them into HDFS and you’re good to go.

Could you do this with Redshift? Sure, but you’d need to pre-process 10 million XML documents and 100GB of logs to extract the appropriate fields, and then create CSV files or SQL INSERT statements to load into Redshift. Given the available options, you’re probably going to end up using Hadoop to do this anyway.

Where Redshift is really going to excel is in situations where your data is basically already relational and you have a clear path to actually get it into your cluster. For example, if you were running three x 15GB MySQL databases with unique, but related data, you’d be able to regularly pull that data into Redshift and then ad-hoc query it with regular SQL. In addition, since the data is already structured you’d be able to use the existing format to create keys in Redshift to improve performance.

Hammers, screws, etc

When it comes down it, it’ll come down to the old “right tool for the right job” aphorism. As an organization, you’ll have to evaluate how your data is structured, the types of queries you’re interested in running, and what level of abstraction you’re comfortable with. What’s definitely true is that “enterprise” data warehousing is being commoditized and the “old guard” better innovate or die.

Fun: What does a “better” rental real estate brokerage look like?

Note: I have zero real estate experience beyond renting apartments in Boston/Cambridge so obviously this is all just hearsay.

I was grabbing drinks with a buddy of mine earlier and we started chatting about “brick and mortar” businesses that for whatever reason weren’t being disrupted by technology. As we were throwing out ideas, one of the business that really captured both of us was rental real estate brokerages. Specifically, we were talking about those typically scummy brokerages that constantly post on Craigslist, show you a few apartments, and then follow through by putting you through a painful experience to actually rent the place. I’m admittedly no expert, but out of the four apartments I’ve rented every experience has been terrible to a varying degree.

What makes them so bad?

The entire process of finding an apartment is pretty terrible but ultimately most of the frustrations boil down to dealing with brokers being lazy or incompetent, inaccurate or incomplete data, and then the absurdity of having to drop of paper forms…in 2014. Venturing into specifics gripes wouldn’t be useful since they’re anecdotal but my general sense is the majority of Boston/Cambridge renters aren’t thrilled with their broker experiences.

A playbook for a better brokerage

At a high level, being successful at this will be driven by building a company culture of excellence and customer service. You’ll have to take Tony Hsieh’s playbook from Zappos, adapt it to running a brokerage, and then feriously build a culture to support it. Concretely, that’ll translate to hiring individuals with high emotional intelligence, trusting them to make decisions, and then buying or building the right tools to make it happen. Ok great, we’re knocking off a famous management philosophy and hiring “awesome people” how are we actually running this thing?

Don’t just pay on commission: This is entirely 2nd hand but my understanding is that most of the brokerages in Cambridge/Boston pay agents entirely on commission. It seems like the net result of this is that agents spend a lot of time chasing crappy deals, and have no incentives to actively help the brokerage. We’re going to pay an hourly rate along with a lower commission based on a combination of factors beyond just the number of deals closed.

Pick a price tier and own it: At all the brokerages I’ve interacted with, they were trying to move apartments throughout the entire pricing spectrum. From $800/mon studios in sketchy neighborhoods to premium 2 bedrooms at $3200/mon in desirable locations. From the brokerage’s point of view it makes perfect sense, since they’re paying on commission they really don’t care if their agents burn hours on low margin apartments – a rental is still money in their pockets. We’re doing it differently, pick a price range and own it. Intuitively, it seems like the best range to focus on would be moderately high priced multi-bedroom apartments in order to optimize both demand and fees captured.

Qualified lead gen: As an outsider looking in, a significant challenge for the strategy we’re outlining is going to be how do you keep a pipeline of qualified leads? Instead of waiting for people to “drop in”, we’re going to be pro-active and be identifying, meeting, and connecting with potential renters before they’re actively renting. From attending startup events to sponsoring events for graduating seniors, we’ll be top of mind for potential renters that certainly will have a future need.

Social and email: None of the brokerages I’ve used ever asked for my email address, guess how many got repeat business? It’s 2014, social and email are critically important channels for winning customers, driving referrals, and building a brand. We’ll start small, with Twitter and Facebook to connect with potential leads and then leverage email to send follow up emails, ask for potential referrals, and then hopefully win repeat business. After that, start experimenting with Faceook ads and display ads.

High quality photos and accurate data: Photos matter, a lot. We’re going to source our own, high quality photos of every apartment that we show. After a year or two, we’ll end up with the best sets of photos for some of the most expensive apartments in the city. On top of that, we’ll be gathering clean, structured data about all of the apartments we’re showing and renting. With this data, our listings will be the most attractive and we’ll also be able to place clients using only our own internal datasets.

Make the paperwork not suck: We’re going to end the frustration of dealing with paper forms. Renters will be able to pay their deposit with a credit card (+2.5% fee) online, fill out the MA renters agreement online, and we’ll actually have them credit checked before they get this far. Close faster, less deals fall apart, and everything is digital. I know companies like RocketLease are already playing in this space and they’d be a perfect partner.

Access better inventory: Unfortunately, this is an exercise for the reader. Beyond hooking into the public MLS feed and tapping into syndicate services like You Got Listings I’m not familiar enough with the real estate market to speak to how to get better listings. Would love to hear any ideas in the comments though!

Anyway, there’s obviously more to running a successful brokerage but looking at my experiences renting and techniques that have worked in other industries I think it would be possible to build a customer focused, technology powered brokerage that was extremely competitive.

Boston Tech Startup Spotlight: Recorded Future

Boston is one of the most active places in the US for technology innovation and home to hundreds of exciting young companies with incredible new ideas. In support of the Boston tech startup scene, I have been publishing a series of short blog posts spotlighting some of our most interesting neighbors.

Due to our continued fascination with big data and support for companies playing in the space it seemed only logical to write about Recorded Future for this edition.  These guys are also headquartered in Cambridge, with offices in Göteborg, Sweden and Arlington, VA.

They constantly collect real-time data from web sources such as news, blogs, and public social media and use their technology to analyze trends and identify past, present, and future events. These events are then linked to the people, places, and organizations that matter to their clients, who include Fortune 500 companies and leading government agencies.

Recorded Future’s team of computer scientists, statisticians, linguists, and technical business people offer up an array of software products and services centered around web intelligence. They also provide the Recorded Future API, a web service that allows developers to get in on the action by accessing Recorded Future’s index for large scale analysis of online media flow.

If you’re interested, there’s lots more about their products and services on their website.

Stay tuned for the next startup spotlight.

Musing: Should everyone learn to code?

Last week, President Obama made headlines by suggesting that every American in school should learn how to code. Predictably, the comment sparked some heated discussion across the web from Fred Wilson’s blog to several threads on Hacker News. Surprisingly, some of the viewpoints were extremely polarized ranging from “its useless, some people will never get it” to “of course!”. Personally, I think everyone should definitely be exposed to some form of programming while they’re in school.

An inescapable reality is that in 2013 computers are a part of everyone’s personal and professional day to day. From non-technical roles in technical fields like account managers or project managers to traditionally non-technical jobs, like teachers, everyone is ultimately interacting with computers on a daily basis. With that in mind, having a basic understanding of how computing abstractions and programming work will benefit everyone. From being able to modify a VBA macro to construct a complex Gmail search query, having a basic understanding of how the pieces fit together certainly can’t hurt.

Looking back at high school, drawing an analogy between studying programming and studying a foreign language isn’t really accurate. A better analogy is really the general experience people have studying math in middle and high school. For people that don’t take a math class in college, that’ll normally be the last time they study math in an academic setting. Although most people forget most of the details they learned, they still retain the overarching fundamentals of how things like algebra and geometry work. Because of this, when people are faced with a basic math problem they generally know what they need to look up in order to solve it. Extending this, if people were introduced to basic programming early on they’d have a sense that there might be an easier way to approach certain tasks. Need to format a list of names in Excel? There might be a function for that.

So how can we make this happen? The good news is there’s already a push to make high quality, programming focused education material available to everyone. There are already dozens of masively online open course projects including Khan Academy, Coursera, and Code Academy providing free, interactive, computer science resource for everyone. The next step is pushing states and school systems to actively adopt CS education for their middle school and high school students. Hopefully it’ll prove and easy and effective step to keeping everyone competitive in an increasingly technology powered workplace.