Data Mining vs Data Collection

by Dan Cave


How do you know the difference?

Data is the hot new thing, and as such it has spawned a bunch of new terms and jargon, which can be pretty hard to keep track of. To help you sound like a data guru instead of a data noob, I’ll be taking you through some of the terms people tend to get a bit confused about.

One of the most common phrases I hear being used incorrectly is Data Mining. There is a very important distinction between Data Mining and Data Collection. I know they sound like they’d be the same thing, but they’re actually very different.

 

So...What is Data Mining

Data Mining refers to the software and computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, Predictive analytics, and database systems.

That’s a fancy way of saying data mining shows you the important patterns inside your existing dataset.  As we all know, data is only as useful as the conclusions we can draw from it. Data mining is simply the process of “mining” that meaning from what would otherwise be an unintelligible spreadsheet.

For example, you might use a data mining program to analyze the buying patterns of your customers and discover that men who bought diapers between Thursday and Saturday were also likely to buy beer.

 

Not to be confused with...Data Collection

Data Collection, unlike data mining, is exactly what it sounds like: the process of gathering and measuring information usually with software. There are loads of different data collection techniques and procedures, but when you’re talking about it in terms of Big Data (which most buzzword lovers are) they usually mean electronic (or online) data collection.

That’s what we do at import.io! We provide a Data Collection tool to help people gather data from the web. The mining of that data is up to you :-)!

 

Let’s Recap

Data Mining - analysing data to find useful patterns

Data Collection - the process of gathering large amounts of data (often from the web)

 


Creating a Startup Manual: Meetings Don't Have to Suck

by Matthew Painter


Let’s face it, meetings suck. They’re long. They’re boring. And there are just so many of them! As part of our mission to make ourselves more efficient over here at import.io; we took a long hard look at the way we do meetings and how we could make them better.


Why Do Meeting Suck?

The key to making meetings better is understanding what it is about them that makes them suck in the first place. One of the most common problems is that there are just too many of them. Meetings, especially bad meetings, can be quite draining. If you’re bouncing from one meeting to the next, you are going to burn out very quickly. Also, if the majority of your working day is spent in meetings then you have no time for actual work.

Not a productive meeting environmant

Not a productive meeting environmant

The second problem with meetings is that they are generally poorly planned. People often turn up not really knowing what needs to be accomplished, who is supposed to decide what and/or without having read any of the prep material. All of these factors create a meeting environment that is unfocused and that isn’t conducive to making decisions.

So, how do you have meetings that don’t suck? At import.io we were guilty of having some pretty suck-tastic meetings. Since the New Year we’ve made a concentrated effort to reduce the number of meetings we have and to keep improving the quality of the meetings we do have. Here’s how we did it...


Do you really need to meet?

It sounds like a silly question, but it’s an important one. We have five basic types of meetings at import:

1:1s

Each line manager at import.io has a bi-weekly 1 on 1 meeting with each of the people they manage. It’s a chance to find out how your employees are doing and see if there is anything they need from you. It’s also a time for them to bring up any small things that they might not feel are important enough to interrupt the working day with. Now that it’s getting nice outside, we’ve started going for walks during them – which provides an excellent excuse to escape from the office and get some fresh air.

Retrospectives

This is a very specific type of meeting, which I’ll cover in more depth in a later post, but in general these are for discussing how something went and how we can make it better next time. Everything is always clearer in hindsight, as they say!

Assembly

This is known in a lot of organizations as the “All Hands” meeting. David (CEO) and I give a weekly update every Wednesday morning to the entire office. Mostly these updates center around funding, company policies, release information, new hires and major announcements. Then there is time for people to ask questions or raise any issues they feel need to be discussed as a whole company.

Presentations to the whole team go on the big screen

Presentations to the whole team go on the big screen

Stand Up

Each department has a daily meeting each morning for no more than 10 minutes. Each team member lists three things that they want to accomplish that day as well as say whether or not they accomplished the three things from the previous day. This keeps everyone in the department aware of what each team member is currently working on and eliminates the need to have update meetings. The management team also have a daily standup.

“Decision” Meetings

These are the meetings you generally think of as the ones that suck. The new rule in our office is: if there’s no decision to be made - there is no meeting. Updates do not count as decisions - that is what stand-up and assembly are for. If you really want to people to be updated more thoroughly than that you should send out an email. It may sound obvious, but thinking about what decision you actually want to get made at the end of the meeting really helps to focus it and make sure the right people are in attendance.


How to have a Good “Decision” Meeting

Decisions, Decisions, Decisions

Before you can start planning your meeting or creating a calendar invite; the first question you have to answer is clearly: What decision needs to get made?

Once you know that the next thing to ask yourself is: Who can make this decision?

Again, I know it sounds a bit obvious, but I’ve been in a number of meetings where everyone shares their opinion about a topic and then we all sit there staring at each other like “Ok, what now?”. On a related note, make sure the decision maker knows that they are the decision maker ahead of time, don’t spring it on them when they get there - people need time to prepare for these sorts of things (more on that in a minute).


The Invitation

Ok, so clearly you need to invite the decision maker. But be a bit stingy with who else you invite to your meetings. Take the time to think about who really needs to be there and why - do you really need the entire marketing department or will just one representative do? What input do you think the decision maker will need before he or she can make the decision?

Do you really need all these people at this meeting?

Do you really need all these people at this meeting?

Now that you’ve got a list of the people you think need to attend the meeting, you can create the calendar invite. We use Google Calendar for all our meetings so you can see when everyone is free. Try to pick a time when everyone is available and try not to schedule meetings back to back for any of the participants - I realize this isn’t always possible. At import we try to maintain a ban on meetings before lunch, since this is supposed to be deep work time.  

When creating your calendar invite, make sure you include all the relevant information. In addition to the decisions that need to be made and who the decision maker is, you’ll need to let everyone know specifically what you want them to have read or prepared for the meeting and what materials they should bring with them. You should also attach any relevant files, presentations, articles, etc that you want everyone to look at. If there is a lot of prep work you want the attendees to do before the meeting, it’s important that you schedule the meeting far enough out that they have time to actually look at the material - don’t ask me to read a 20 page article a few hours before the meeting is due to start, it just isn’t going to happen.


The Meeting

You’ve done it! You got all the right people in the right place at the right time having read the right prep material. Now lets go over a bit of in-meeting etiquette.

  1. If you don’t strictly need your laptop don’t bring it - it is way to easy to become distracted by emails or instant messages - or in one specific import.io case Wikipedia (you know who you are :)

  2. Stick to the agenda - don’t get bogged down on details that aren’t important, make a decision and move on.

  3. Try to end early - if you can get all the decision making done in 20 minutes for a 1 hour meeting great! Don’t sit around and find other things to talk about.

  4. Let everyone have a chance to speak - you invited them all for a reason, it’s important that they get their say. Don’t let one person monopolize the meeting.  

  5. Record the actions of the meeting - not only this, but record why you decided on those actions because in 6 months you may not be able to remember what the rationale was.

Write down the decision AND the rationale behind it

Write down the decision AND the rationale behind it

 

And that’s it! Implementing these simple changes has really helped us to cut down on the number of meetings we have and has reduced the length of the meetings we do need to have.

 

Next up: Feature Development


Making the Most of Datasets!

by Alex Gimson


An import.io Webinar Production

Thanks again to everyone who came to our webinar on Datasets yesterday! I think Chris and I are starting to really get the hang of these. As usual we’ve recorded the whole thing and put it up on YouTube so you can refer back to it whenever you need to.

A Short Recap

For those of you who don’t know the Dataset page is where you can see all the data you’ve extracted! From there you can refresh your data, query your Connectors, download it to your machine or share it with your friends. It’s also the place where you can combine multiple data sources together and a good place to access our integrate page.

First I showed you how to create a new Dataset and add your Data Sources to it. Then I walked you through all the options you have such as refreshing your data, saving it, sharing it and downloading it to your machine! If you want a more in-depth refresher on these topics, check out the this tutorial!

And just in case you happen to be as big a Football fan as I am: here is the data I used.

Next, I showed you the specifics of what you can do with each of the different types of data sources you can create using our tool!

  1. Extractors - add in a new URL

  2. Crawlers - re-crawl the site

  3. Connectors - querying in the dataset page and doing multiple queries

Then things got really exciting when I showed you how to combine multiple different Connectors to create a Mix, allowing us to search one term across multiple sites! You can try my Mix to UK supermarkets for yourself, and find the cheapest place to do all your shopping.

Finally, I showed you how to do a simple integration of your Dataset with Google Sheets. You can learn more about integrations by reading the tutorials below or visiting our integrate page yourself!

 

Your Questions

Can you see which of the URLs have had data updated from the last crawl?

This isn’t currently possible to do with our UI - we’re working on it! You can get the previous crawls over the API though and then compare the data yourself with a simple script. If you want to know more about how to do this, just email us at support@import.io and we’ll show you!

 

Can I get data from behind a JavaScript action?

You sure can! By default whenever you create a data source, we first try to get the data with JavaScript turned off - because it’s easier. But, import.io Connectors do support getting data from sites that require JavaScript. Simply follow the instructions in this tutorial and if you can’t see your data in the Detect Optimal Settings step click “No”. This will turn JavaScript on and you can carry on building your building your Connector as normal!

 

Are there any websites that cannot be crawled?

Every website is different. Some websites are easier to crawl than others - it all depends on how the HTML is structured. Because import.io Crawlers are really Extractors, we find that we have a pretty good success rate (especially now that you can crawl with JavaScript). Try these tips and tricks first. If you find you’re having trouble getting your crawler to work just email us at support@import.io. If a crawler doesn’t work, you may also have more luck using an Extractor or a Connector, and still get the data you need.

 

Can you get product reviews?

You can definitely get product reviews. If you want product information and the product reviews you will need to build two different extractors (because the data is different) and then combine them in a Dataset. Chris A actually built a web app that does just that for Amazon music!


 

Are there limits to integrating with Google Sheets?

Because of the way the Google Sheets integration works you can only get one page of data from one source at a time. Which means that even if you train your Connector with pagination, when you integrate it (with Google Sheets) you’ll only be able to see the first page of results.  

 

Join Us Next Time

For our next webinar Chris and I will be teaming up with the lovely Jewel Loree from Tableau to show you how to get data and visualize it! Sign up here to join us on the 22nd of April at 4pm GMT.

If you have an idea for a webinar you’d like to see email me at support@import.io!

 


The Tech World is Ageist!

by David White


If there’s one thing I just don’t get about the tech scene - whether here or in the States - it’s the obsession with age. It’s bizarre.


No Oldies Allowed

I recently had an encounter with a television network (who will remain nameless) that highlighted this very obsession. They initially got in touch because they wanted to make a documentary about a successful London startup. They were very interested in import.io since we are doing well and are fairly well known in the London tech scene. We are also planning to move part of our operations out to Silicon Valley and have been expanding rapidly as a company. Everything was adding up.

Then, a few weeks ago, I got a call saying they had decided to drop us. When I asked why, they told me they were only interested in startups whose founders are under 25. Now, at 43 I realize I’m not the striking young man I used to be. But, come on! That hardly makes me a less interesting CEO. In fact, I’d argue it makes me a more interesting one.

The more I thought about this, the more confused - and, I’ll admit, a little hurt - I became. As far as I can work out, tech is the only area of business where people are so fascinated with age.Think about it, if a major bank put out a press release saying they were hiring a 25-year-old as the next CEO, everyone would freak. And for good reason. Someone who’s 25 has very little (probably no) practical CEO - or even banking for that matter - experience. 


Experience vs Youth

There is this strange idea in tech that only young people can be innovative or have good ideas. I certainly don’t think this is the case. Surely, the far more important question, when looking at a CEO, should be “What relevant experience does he/she have?”. If you’re 25 and making a sexting app; sure, you probably have relevant experience. But, if you’re trying to solve a real industry issue, I’d say most 25 year-olds I’ve met would be totally useless.

I’m not saying that younger founders don’t have good ideas, or that they can’t make anything other than simple sexting apps. Far from it. But, if you look at the data behind who have been the most successful founders in tech or made the most disruptive technologies, it hasn’t been the fresh-out-of-uni types, albeit with a few notable - and well publicised - exceptions.


Here's the proof

This isn’t just me being a grouch, there have been several studies which show that having some real-world experience under your belt is more beneficial when starting a company. The Founder’s Institute recently conducted a study which showed that older age has shown to correlate with more successful entrepreneurs up to the age of 40 (after which point it makes minimal difference). And Business Insider found that more than 50% of founders whose companies were valued at $25M+ were over the age of 30.

Chart courtesy of Business Insider

Chart courtesy of Business Insider

The anecdotal evidence largely supports these findings as well. Jeff Bezos started Amazon at 30, Jack Dorset co-founded Twitter at 30, Ried Hoffman co-founded LinkedIn at 35, Elon Musk launched SpaceX at 31 and Huddle co-founders Andy McLoughlin and Alastair Mitchell are 33. Even the founders of WhatsApp (Jan Koum and Brian Acton) - arguably the most successful tech exits to date - were 35 and 39 respectively.

Infographic courtesy of Funders and Founders

Infographic courtesy of Funders and Founders

I think if you really look at the track record of tech companies, you’ll see that most of them do value relevant experience over youth. One of the first things Google did when they started gaining traction was bring in Eric Schmidt (46), who was a seasoned tech professional.

 

A dangerous precedent

So clearly there is a need for, and history of, seasoned professionals in tech startups. But, if you believe the tech press, you’d think that all Silicon Valley founders are young-white-males. I think this is a huge detriment to the culture of startups in general. The tech industry in San Francisco has essentially become a bit of a boys club in a lot of respects. Left unchecked, this mindset can become dangerous very quickly - trust me, I worked at a major bank in the 90s. If the PR disaster that was Julie Ann Horvath leaving GitHub tells us anything, it’s that Silicon Valley needs a serious reality check.

At the risk of sounding like a bitter old man, this attitude around age needs to change. To a certain extent I think it may be more of a media attitude than anything else. But the media has a large impact on the way people in and outside of Silicon Valley think and act. The media needs to focus on great entrepreneurs. Great entrepreneurs are all ages, all sexes and may not even be based in the Valley!


Wifi and Plugs

by Nick Scott


Our good friend and import.io early-adopter Graham Paterson is at is again! You may remember him as the genius behind ThatGift, which we covered back at Christmas. This time Graham has used import.io to create a map to help you find free Wifi!


How Does He Do It?

Graham’s latest side project, Wifi and Plugs, shows you a handy map of all the Wifi hotspots around London. To build it he first used import.io to collect data from the popular site Foursquare. Armed with a list of businesses which offer free Wifi, he used a tool called GeoCoding to turn their addresses into longitude and latitude. Then he uploaded these coordinates into MapBox to create the handy map you can see here.

Screen Shot 2014-04-14 at 16.52.05.png

Now comes the clever part: users can submit other known Wifi spots to be included on the map. Thanks to Graham we can now crowdsource our way to free Wifi!


What’s Next?

Graham says Wifi and Plugs was actually a practice run for developing an app that showcases different types of drinking spots in London (late-night, beer garden, sports bar, etc). Can’t wait for that one!

Wifi and Plugs is another great example of what a little imagination and access to the right tools can create!


Give it a go!

Have a great idea for what to do with data? Share it with us!

To get you started here’s a quick little how-to showing you how Graham did it!