General Archives - Page 11 of 26 - {5} Setfive

Net Neutrality has been all over the news lately and I’ve been fielding a couple of questions related to it. At Setfive, we think it’s a critically important issue, both to startups and the technology infrastructure of the United States as a whole. Because of that, we decided to pull together an overview, some history, and key outcomes surrounding the Net Neutrality debate. As always, questions or comments welcome!

What is Net Neutrality?
First coined by Columbia Law professor Tim Wu, network neutrality, or net neutrality for short, states that internet service providers (such as Verizon and Comcast) and governments should provide you with access to content and data regardless of where it came from equally. Internet service providers (ISPs) are not allowed to discriminate and slow speeds for one company in favor of its competitor.

Essentially, net neutrality maintains a free, open, and fair internet.

The Lead Up To January 14, 2014

In 2002, the FCC had the opportunity to regulate ISPs as it had done for the phone companies. Ultimately though, the FCC chose not to at all citing that ISPs are “information services”, completely different than the telecommunication services phone companies provide.
However a few years later, the FCC began to notice the enormous power and strength that ISPs had accumulated over the years. In an attempt to curb and regulate them, the FCC created the Open Internet Rules in 2010

The Open Internet Rules established:

Enforced transparency of ISPs operations and management of their networks
Prohibited ISPs from obstructing access to legal content and applications
Maintained an equal and fair playing field online by preventing ISPs from giving preference to one company over another. Essentially becoming the core of net neutrality

In response to these rules, Verizon brought the FCC to court in 2013 on the charge that the agency had no authority to use the Open Internet rules to regulate ISPs.

Fast forward to January 14, 2014

On this day, a DC circuit court determined in the Verizon Communications Inc. vs FCC case that portions of the Open Internet Rules especially the ones pertaining to an equal and fair internet could not be applied to ISPs.
The reasoning was that portions of the rules apply only to common carriers, which provide telecommunication services. But since ISPs are classified by the FCC as providers of information services, they’re not considered under the law as common carriers.

What does this ruling mean?
It eliminated the only existing rules protecting net neutrality. As a result, ISPs can now:

Charge companies fees for “premium” access to their consumers. Think Verizon charging Netflix to stream to their customers at better rates.
Selectively prioritize one source of traffic over another. Think Comcast prioritizing delivering its Xfinity onDemand service over HBO Go.
And of course, create “slow lanes” and “fast lanes” paving the way to charging for ala carte Internet packages, just like TV. Imagine seeing errors like: “Sorry! You need to subscribe to the ‘social package’ to access this site.”

What’s the president’s stance on all this?
He’s pro net neutrality and has urged the FCC to establish strong rules that would protect it. However since the FCC is an independent government agency, Obama has no direct influence. Additionally, in a bitterly divided congress some hardline Republicans are taking an anti-Net Neutrality stance to pander to their base. See The Oatmeal on Ted Cruz.

What’s next?
The FCC does have the power to reclassify ISPs as telecommunication service providers and thus subject them to the Open Internet Rules. What it decided to do instead is to create a new net neutrality framework that would hold up in court while at the same time satisfy both sides.

Right now, everyone is in a holding pattern waiting for the FCC to make a final announcement.

Welcome to the weekend! We’ve rounded up some interesting reading to carry you through the till Monday. Fire up your iPad, grab some cider, and snuggle up with a blanket:

Why are CVS and Rite Aid blocking Apple Pay?

As members of MCX, a group of retailers hoping to create an alternate payment solution, CVS and Rite Aid are blocking Apple Pay in hopes of slowing its adoption. Unfortunately for the MCX consortium, numerous disagreements about the system’s foundational premises has made it almost impossible for it to launch after 4 years of development. Keep reading >

Apple Pay, partnerships and software as disruption

Apple in recent years have changed the way industries appear and operate. It has done so in three ways. The first is it builds tightly integrated products. The second is a major drive to partner with companies who can fill in the gaps that it lacks. Third, while a partnership with apple may look great, apple has a tendency to move an entire industry into software. And so the question left is: what’s next for Apple and Apple Pay? Keep reading >

“Contact Me” surprisingly drives sales

Contrary to what you think, contacting a sales representative actually brings in more revenue than free trials can. Here is why and how you can capitalize on this. Keep reading >

A Few Non-Obvious Things I Learned as a New VC

For you future VC’s out there, there are 7 essential insider tips you need to know to take on the VC world by storm. Key takeaway? You should consider each investment as a marriage without the easy divorce option. Keep reading >

FCC proposes net neutrality compromise. Everyone hates it.

In its effort to make both sides happy in the fierce battle over net neutrality, the FCC is considering a plan that would separate broadband into two services, one as retail and another for back-end. Unfortunately for them, instead of the plan satisfying both sides as it should, it has only fueled more outrage. Keep reading >

Last week, I was catching up with some friends when one of them asked an interesting question – Which Boston area companies are currently hiring PHP developers? Surprisingly, I didn’t really have a good answer so I decided to find out. To figure this out, I searched job posts that were specifically looking for PHP developers and started pulling together a spreadsheet about the posts. As I was looking at the data, I decided to put together a graphic which is available below along with the list of companies. As always, questions or comments welcome!

Company	City	Company	City
Acquia	Burlington	ADTRAN	Burlington
ADTRAN	Burlington	Allen & Gerritsen	Boston
Allen & Gerritsen	Boston	Applause	Framingham
Applause	Framingham	Arbor Networks	Burlington
Arbor Networks	Burlington	Berklee College Of Music	Boston
Berklee College Of Music	Boston	Biogen Idec	Cambridge
Biogen Idec	Cambridge	Black Duck Software	Burlington
Black Duck Software	Burlington	Blue State Digital	Boston
Blue State Digital	Boston	Brafton Inc.	Boston
Brafton Inc.	Boston	Brigham And Women’s Hospital	Wellesley
Brigham And Women’s Hospital	Wellesley	Brightcove	Boston
Brightcove	Boston	Catalina Marketing	Boston
Catalina Marketing	Boston	Comsol	Burlington
Comsol	Burlington	Constant Contact	Waltham
Constant Contact	Waltham	ContentLEAD	Boston
ContentLEAD	Boston	D50 Media	Wellesley
D50 Media	Wellesley	Demandware	Burlington
Demandware	Burlington	Desire2Learn (D2L)	Boston
Desire2Learn (D2L)	Boston	Dew Softech (contract Position)	Boston
Dew Softech (contract Position)	Boston	Digital Bungalow	Salem
Digital Bungalow	Salem	Dynatrace	Waltham
Dynatrace	Waltham	Egenerationmarketing	Boston
Egenerationmarketing	Boston	FASTHockey	Boston
FASTHockey	Boston	Flipkey, Inc.	Boston
Flipkey, Inc.	Boston	Genscape, Inc.	Boston
Genscape, Inc.	Boston	Harvard Medical School	Boston
Harvard Medical School	Boston	Harvard School Of Public Health	Boston
Harvard School Of Public Health	Boston	Hill Holliday	Boston
Hill Holliday	Boston	Hubspot, Inc.	Cambridge
Hubspot, Inc.	Cambridge	Integrated Computer Solutions	Bedford
Integrated Computer Solutions	Bedford	Intersystems	Cambridge
Intersystems	Cambridge	Mediamath	Cambridge
Mediamath	Cambridge	Medtouch	Cambridge
Medtouch	Cambridge	MIT	Cambridge
MIT	Cambridge	Modo Labs	Cambridge
Modo Labs	Cambridge	Motus (crs)	Boston
Motus (crs)	Boston	Namemedia	Waltham
Namemedia	Waltham	Nanigans	Boston
Nanigans	Boston	Northeastern University	Boston
Northeastern University	Boston	Northpoint Digital	Boston
Northpoint Digital	Boston	Nutraclick	Boston
Nutraclick	Boston	Pegasystems	Cambridge
Pegasystems	Cambridge	Placester	Boston
Placester	Boston	Polar Design	Woburn
Polar Design	Woburn	Sevone, Inc.	Boston
Sevone, Inc.	Boston	Silversky	Boston
Silversky	Boston	Smartertravel.com	Boston
Smartertravel.com	Boston	Source Of Future Technology, Inc.	Cambridge
Source Of Future Technology, Inc.	Cambridge	Studypoint	Boston
Studypoint	Boston	Surfmerchants LLC	Boston
Surfmerchants LLC	Boston	Tatto Media	Boston
Tatto Media	Boston	Tufts University	Boston
Tufts University	Boston	Umass Boston	Boston
Umass Boston	Boston	Unitrends	Burlington
Unitrends	Burlington	Wayfair	Boston
Wayfair	Boston	Zipcar	Boston

It’s been a long week but you’ve made it, it’s Friday! Nothing goes better with Fridays than a couple of fresh links for your ride home and of course a cold beer. We can’t help you with that beer but we’ve got you covered on those links. A slew of new wearable health products were released this week and here they are:

Is Google Fit a fit in your life?

The new Google Fit is a new app that lets you track your daily activities and progress towards your fitness goals. You already use 2 other fitness apps? No problem! Google Fit compiles the data from all the apps you’re using into one simple easy application. Keep reading >

Fitness right on your wrist

The latest Microsoft product has finally come out. Microsoft Health and Microsoft Band are the company’s new fitness service and gear that gives you insights on your health in real time and provides appropriate actions to take to reach your fitness goals. You can get the latest news, texts, weather, e-mails, reminders, and of course your daily physical activities right to your wrist. Keep reading >

Welcome to the Fitbit family

Meet the newest additions to the Fitbit family: Charge, Charge HR, and Surge. These stylish but innovative fitness gears have everything you need to reach your fitness goals. They feature all day activity tracking, Fitbit’s own PurePulse technology for 24/7 heart rate monitoring, built in GPS, and caller ID to name a few. Keep reading >

Planning on picking up a fitness tracker? Let us know in the comments!

Over the last few weeks we’ve been utilizing Gearman to help us do some realtime stream processing. In production, what we’ve basically been doing is reading messages off an Amazon Kinesis stream, creating jobs in Gearman for anything that’s computationally expensive, and then gathering up the processed data for a batched insert into Amazon Redshift on a Gearman job as well. Conceptually, this workflow is reasonably similar to how MapReduce works where a series of input jobs is transformed by “mappers” and then results are collected in a “reduce” step.

From a practical point of view, using Gearman like this offers some interesting benefits:

Adding additional “map” capacity is relatively straightforward since you can just add additional machines that connect to the Gearman server.
Developing and testing the “map” and “reduce” functionality is easy since nothing is shared and you can run the code directly, independently of Gearman.
In our experience so far, the Gearman server can handle a high volume of jobs/minute – we’ve pushed ~300/sec without a problem.
Since Gearman clients exist for dozens of languages, you could write different pieces of the system in whatever language fits best.

Overview

OK, so how does all of this actually work. For the purposes of a demonstration, lets assume you’ve been tasked with scraping the META keywords and descriptions from a few hundred thousand sites and counting up word frequencies across all the sites. Assuming you were doing this in straight PHP, you’d end up with code that looks something like this.

The problem is that since you’re making the requests sequentially, scraping a significant number of URLs is going to take an intractable amount of time. What we really want to do is fetch the URLs in parallel, extract the META keywords, and then combine all that data in a single data structure.

To keep the amount of code down, I used the Symfony2 Console component, Guzzle and Monolog to provide infastructure around the project. Walking through the files of interest:

GearmanCommand.php: Command to execute either the “node” or the “master” Gearman workers.
StartScrapeCommand.php: Command to create the Gearman jobs to start the scrapers
Master.php: The code to gather up all the extracted keywords and maintain a running count.
Node.php: Worker code to extract the meta keywords from a given URL

Setup

Taking this for a spin is straightforward enough. Fire up an Ubuntu EC2 and then run the following:

OK, now that everything is setup lets run the normal PHP implementation.

Looks like about 10-12 seconds to process 100 URLs. Not terrible but assuming linear growth that means processing 100,000 URLs would take almost 2.5 hours which is a bit painful. You can verify it worked by looking at the “bin/nogearman_keyword_results.json” file.

Now, lets look at the Gearman version. Running the Gearman version is straightforward, just run the following:

You’ll eventually get an output from the “master” when it finishes with the total elapsed time. It’ll probably come in somewhere around 15ish seconds again because we’re still just using a single process to fetch the URLs.

Party in parallel

But now here’s where things get interesting, we can start adding multiple “worker” processes to do some of the computation in parallel. In my experience, the easiest way to handle this is using Supervisor since it makes starting and stopping groups of processes easy and also handles collecting their output. Run the following to copy the config file, restart supervisor, and verify the workers are running:

And now, you’ll want to run “application.php setfive:gearman master” in one terminal and in another run “php setfive:start-scraper 100sites.txt” to kick off the jobs.

Boom! Much faster. We’re still only doing 100 URLs so the effect of processing in parallel isn’t that dramatic. Again, you can check out the results by looking at “bin/keyword_results.json”.

The effects of using multiple workers will be more apparent when you’ve got a larger number of URLs to scrape. Inside the “bin” directory there’s a file named “quantcast_site_lists.tar.gz” which has site lists of different sizes up to the full 1 million from Quantcast.

I ran a some tests on the lists using different numbers of workers and the results are below.

	0 Workers	10 Workers	25 Workers
100 URLs	12 sec.	12 sec.	5 sec.
1000 URLs	170 sec.	34 sec.	33 sec.
5000 URLs	1174 sec.	195 sec.	183 sec.
10000 URLs	2743 sec.	445 sec.	424 sec.

One thing to note, is if you run:

And notice that “processUrl” has zero jobs but there’s a lot waiting for “countKeywords”, you’re actually saturating the “reducer” and adding additional worker nodes in Supervisor isn’t going to increase your speed. Testing on a m3.small, I was seeing this happen with 25 workers.

Another powerful feature of Gearman is that it makes running jobs on remote hosts really easy. To add a “remote” to the job server, you’d just need to start a second machine, update the IP address in Base.php, and user the same Supervisor config to start a group of workers. They’d automatically register to your Gearman server and start processing jobs.

Anyway, as always questions and comments appreciated and all the code is on GitHub.

Category: General

Net Neutrality: A recap and some cliffnotes

Friday Links: Apple Pay, SaaS, and Net Neutraility