Friday, 28 November 2014

Scraping Online Communities for your Outreach Campaigns

Online communities offer a wealth of intelligence for blog owners and business owners alike.

Exploring the data within popular communities will help you to understand who the major influencers are, what content is popular and who are the key content aggregators within your niche.

This is all fair and well to talk about, but is it feasible to be manually sorting through online communities to find this information out? Probably not.

This is where data scraping comes in.

What is Scraping and What Can it do?

I’m not going to go into great detail on what data scraping actually means, but to simplify this, here’s a definition from the Wikipedia page:

    “Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program.”

Let me explain this with a little example…

Imagine a huge community full of individuals within your industry. Each person within the community has a personal profile page that contains information about their interests, contact details, social profiles, etc.

If you were tasked with gathering all of this data on all of the individuals then you might start to hyperventilating at the thought of all the copy and pasting you’d need to do.

Well, an alternative is to scrape all of this content so that you can automate all of this process and easily export all of this information into a manageable, more consumable format in a matter of seconds. It’d be pretty awesome, right?

Luckily for you, I’m going to show you how to do just that!

The Example of Inbound.org

Recently, I wanted to gather a list of digital marketers that were fairly active on social media and shared a lot of content online within communities. These people were going to be some of my core targets to get content from the blog in front of.

To do this, I first found some active communities online where these types of individuals hang out. Being a digital marketer myself, this process was fairly easy and I chose Inbound.org as my starting place.

Scoping out Data Requirements

Each community is different and you’ll be able to gather varying information within each.

The place to look for this information is within the individual user profile pages. This is usually where the contact information or links to social media accounts are likely to be displayed.

For this particular exercise, I wanted to gather the following information:

    Full name
    Job title
    Company name and URL
    Location
    Personal website URL
    Twitter URL, handle and follower/following stats
    Google+ URL, follower count and list of contributor URLs
    Profile image URL
    Facebook URL
    LinkedIn URL

With all of this information I’ll be able to get a huge amount of intelligence about the community members. I’ll also have a list of social media accounts to add and engage with.
On top of this, with all the information on their websites and sites that they write for, I’ll have a wealth of potential link building prospects to work on.

Inbound.org Profiles

You’ll see in the above screenshot that a few of the pieces of data are available to see on the Inbound.org user profiles. We’ll need to get the other bits of information from the likes of Twitter and Google+, but this will all stem from the scraping of Inbound.org.

Sign Up To My Newsletter
Scraping the Data

The idea behind this is that we can set up a template based on one of the user profiles and then automate the data gathering across the rest of the profiles on the site.

This is where you’ll need to install the SEO Tools plugin for Excel (it’s free). If you’ve not used this plugin before, don’t worry – I’ve put together a full tutorial here.

Once you’ve installed the plugin, you’re good to go on the actual scraping side of things…

Quick Note: Don’t worry if you don’t have a good knowledge of coding – you don’t need it. All you’ll need is a very basic understanding of reading some code and some basic Excel skills.

To begin with, you’ll need to do a little Excel admin. Simply add in some column titles based around the data that you’re gathering. For example, with my example of Inbound.org, I had, ‘Name’, ‘Position’, ‘Company’, ‘Company URL’, etc. which you can see in the screenshot below. You’ll also want to add in a sample profile URL to work on building the template around.

spreadsheet admin

Now it’s time to start getting hands on with XPath.

How to Use XPathOnURL()

This handy little formula is made possible within Excel by the SEO Tools plugin. Now, I’m going to keep this very basic because there are loads of XPath tutorials available online that can go into the very advanced queries that are possible to use.

For this, I’m simply going to show you how to get the data we want and you can have a play around yourself afterwards (you can download the full template at the end of this post).

Here’s an example of an XPath query that gathers the name of the person within the profile that we’re scraping:

=XPathOnUrl(A2, "//*[@id='user-profile']/h2")

A2 is simply referencing the cell that contains the URL that we’re scraping. You’ll see in the screenshot above that this is Jason Acidre’s profile page.

The next part of the formula is the XPath.

What this essentially says is to scrape through the HTML to find a tag that has ‘user-profile’ id attached to it. This could be a div, span, a or whatever.

Once it’s found this tag, it then needs to look at the first h2 tag within this area and grab the text within it. This is Jason’s name, which you’ll see in the screenshot below of the code:

website code

Don’t be put off at this stage because you don’t need to go manually trawling through code to build these queries, there’s a much simpler way.

The easiest way to do this is by right-clicking on the element you want to scrape on the webpage (within Chrome); for example, on Inbound.org, this would be the profile name. Now click ‘Inspect element’.

inspect element

The developer tools window should now appear at the bottom of your browser (or in a separate window). Within that, you should see the element that you’ve drilled down on.

All you need to do now is right-click on it and press ‘Copy XPath’.

copy XPath

This will now copy the XPath code for your Excel formula to the clipboard. You’ll just need to add in the first part of the query, i.e. =XPathOnUrl(A2,

You can then paste in the copied XPath after this and add a closing bracket.

Note: When you use ‘Copy XPath’ it will wrap some parts of the code in double apostrophes (“) which you’ll need to change to single apostrophes. You’ll also need to wrap the copied XPath in double apostrophes.

Your finished code will look like this:

=XPathOnUrl(A2, "//*[@id='user-profile']/h2")

You can then apply this formula against any Inbound.org profile and it will automatically grab the user’s full name. Pretty good, right?

Check out the full video tutorial below that I’ve put together that talks you through this whole process:

[sws_blue_box box_size=””] Want more useful video tutorials? Subscribe to my YouTube channel now![/sws_blue_box]

XPath Examples for Grabbing Other Data

As you’re probably starting to see, this technique could be scaled across any website online. This makes big data much more attainable and gives you the kind of results that an expensive paid tool would offer without any of the cost – bonus!

Here’s a few more examples of XPath that you can use in conjunction with the SEO Tools plugin within Excel to get some handy information.

Twitter Follower Count

If you want to grab the number of followers for a Twitter user then you can use the following formula. Simply replace A2 with the Twitter profile URL of the user you want data on. Just a quick word of warning with this one; it looks like it’s really long and complicated, but really I’ve just used another Excel formula to snip of the text ‘followers’ from the end.

=RIGHT(XPathOnUrl(D57,"//li[@class='ProfileNav-item ProfileNav-item--followers']"),LEN(XPathOnUrl(D57,"//li[@class='ProfileNav-item ProfileNav-item--followers']"))-10)

Google+ Follower Count

Like with the Twitter follower formula, you’ll need to replace A2 with the full Google+ profile URL of the user you want this data for.

=XPathOnUrl(H67,"//span[@class='BOfSxb']")

List of ‘Contributor to’ URLs

I don’t think I need to tell you the value of pulling in a list of websites that someone contributes content to. If you do want to know then check out this post that I wrote.

This formula is a little more complex than the rest. This is because I’m pulling in a list of URLs as opposed to just one entity. This requires me to use the StringJoin function to separate all of the outputs with a comma (or whatever character you’d like).

Also, you may notice that there is an additional section to the XPath query, “href”. This pulls in the link within the specific code block instead of the text.

As you’ll see in the full Inbound.org scraper template that I’ve made, this is how I pull in the Twitter, Google+, Facebook and LinkedIn profile links.

You’ll want to replace A2 with the Google+ profile URL of the person you wish to gather data on.

=StringJoin(", ",XPathOnUrl(A2,"//a[@rel='contributor-to nofollow']","href"))

Twitter Profile Image URL

If you want to get a large version of someone’s Twitter profile image then I’ve got just the thing for you.

Again, you’ll just need to substitute A2 with their Twitter profile URL.

=XPathOnUrl(A2,"//*[@class='profile-picture media-thumbnail js-tooltip']","data-resolved-url-large")

Some Findings from the Data I’ve Gathered

With all big data sets will come some interesting findings. Here’s a few little things that I’ve found from the top 100 influential users on Inbound.org.

average followers chart

The chart above maps out the average number of followers that the top 100 users have on both Twitter (12,959) and Google+ (9,601). As well as this, it shows the average number of users that they follow on Twitter (1,363).

The next thing that I’ve looked at is the job titles of the top 100 users. You can see the most common occurrences of terms within the tag cloud below:

Job titlesFinally, I had a look through all of the domains listed within each of the top 100 Inbound.org users’ Google+ ‘contributor to’ sections and mapped out the most frequently mentioned sites.

Here’s the spread of domains that were the most popular to be contributed to:

domain frequency

It Doesn’t Stop There

As you’ve probably gathered, this can be scaled out across pretty much any community/forum/directory/website online.

With this kind of intelligence in your armoury, you’ll be able to gather more intelligence on your targets and increase the effectiveness of your outreach campaigns dramatically.

Also, as promised, you can download my full Inbound.org scraper template below:

[sdfile url=”http://www.matthewbarby.com/goodies/MatthewBarby-Inbound-Scraper.xlsx” redirect=”http://www.matthewbarby.com/thanks-downloading-inbound-scraper/”]

TL;DR

    Online communities hold valuable data on your target audiences – use it!

    Scale out your intelligence gathering by brushing up on your XPath.

    Download my Inbound.org scraper template and let it work its magic.

Source:http://www.matthewbarby.com/scraping-communities-with-xpath/

Thursday, 27 November 2014

Web Data Extraction: driving data your way

Most businesses rely on the web to gather data such as product specifications, pricing information, market trends, competitor information, and regulatory details. More often than not, companies collect this data manually—a process that not only takes a significant amount of time, but also has the potential to introduce costly errors.

By automating data extraction, you're able to free yourself (and your pointer finger) from hours of copy/pasting, eliminate human errors, and focus on the parts of your job that make you feel great.

Web data extraction: What it is, why it's used, and how to get it right on an ongoing basis

Web data extraction, screen scraping, web harvesting—while these terms may have different connotations they all essentially point to the same thing: plucking data from the web and populating it in an organized way in another place for further analysis or more focused use. In an era where “big data” has become a commonplace concept, the appeal of web data extraction has grown; it’s an extremely efficient alternative to web browsing, and culls very specific data for a focused purpose.

How it's used

While each company’s needs vary, data extraction is often used for:

    Competitive intelligence, including web popularity, social perception, other sites linking to them, and placement of competitor advertisements

    Gathering financial data including stock market movement, product pricing, and more

    Creating continuity between price sheets and online websites, catalogs, or inventory databases

    Capturing product specifications like dimensions, color, and materials

    Pulling tabular data from multiple sources for in-depth analysis

Interestingly, some people even find that web data extraction can aid them in their leisure time as well, pulling data from blogs and websites that pertain to their hobbies or interests, and creating their own library of organized information on a topic. Perhaps, for instance, you want a list of all the designers that George Clooney wears (hey- we won’t question what you do in your free time). By using web scraping tools, you could automatically extract this type of data from, say, a fashion blogger who follows celebrity style, and create your own up-to-date shopping list of items.

How it's done

When you think of gathering data from the web, you should mentally juxtapose two different images: one of gathering a bucket of sand one grain at a time, and one of filling a bucket with a shovel that has the perfect scoop size to fill it completely in one sitting. While clearly the second method makes the most sense, the majority of web data extraction happens much like the first process--manually, and slowly.

Let’s take a look at a few different ways organizations extract data today.

The least productive way: manually

While this method is the least efficient, it’s also the most widespread. On the plus side, you need to learn absolutely nothing except “Ctrl+C/V” to use this method, which explains why it is the generally preferred method, despite the hours of time it can take. Imagine, for instance, managing a sales spreadsheet that keeps inventory up to date so that the information can be properly disseminated to a global sales team. Not only does it take a significant amount of time to update the spreadsheet with information from, say, your internal database and manufacturer’s website, but information may change rapidly, leaving sales reps with inaccurate information regardless.

Finding someone in the organization with a talent for programming languages like Python

Generally, automating a task without dedicated automation software requires programming, and therefore an internal resource with a solid familiarity with programming languages to create the task and corresponding script. While most organizations do, in fact, have a resource in IT or engineering with this type of ability, it often doesn’t seem like a worthy time investment for that person to derail the initiatives he or she is working on to automate web data extraction. Additionally, if companies do choose to automate using in-house resources, that person will find himself beholden to a continuing obligation, since he or she will need to adjust scripting if web objects and attributes change, disabling the task.

Outsourcing via Elance or oDesk

Unless there is a dedicated resource ready to automate and maintain data extraction processes (and most organizations wouldn’t necessarily choose to use their in-house employee time this way), companies might turn to outsourcing companies such as Elance or oDesk to hire contract help. While this is an effective way to automate a task using a resource that has a level of acumen in automation, it represents an additional cost--be it one time or on a regular basis as data extraction requirements change or increase.

Using Excel web queries

Since more often than not, data extracted from the web is often populated into an Excel spreadsheet, it’s no wonder that Excel includes web query tools expressly for that purpose. These tools are particularly useful in pulling tabular data from a website (such as product specifications, legal codes, stock prices, and a host of other information) and automatically pushing the data into a spreadsheet. Excel queries do have limitations and a learning curve, however, particularly when creating dynamic web queries. And clearly, if you’re hoping to populate the information in other sources, such as external databases, there is yet another level of difficulty to navigate.

How automation simplifies web data extraction

Culling web data quickly

Using automation is the simplest way to extract web data. As you execute the steps necessary to perform the task one time, a macro recorder captures each action, automatically generates an easily-editable script, and lets you specify how often you would like to repeat the task, and at what speed.

Maintaining the highest level of accuracy

With humans copy/pasting data, or comparing between multiple screens and entering data manually into a spreadsheet, you’re likely to run into accuracy issues (sometimes directly proportionate to the amount of time spent on the task and amount of coffee in the office!) Automation software ensures that “what you see is what you get,” and that data is picked up from the web and put back down where you want it without a hitch.

Storing web data in your preferred format

Not only can you accurately transfer data with automation software, you can also ensure that it’s populated into spreadsheets or databases in the format you prefer. Rather than simply dumping the data into a spreadsheet, you can ensure that the right information is put into the proper column, row, field, and style (think, for instance, of the difference between writing a birth date as “03/13/1912” and “12/3/13”).

Simplifying data analysis

Automation software allows you to aggregate data from disparate sources or enormous stockpiles of structured or unstructured data in a way that makes sense for your business analysis needs. This way, the majority of employees in an organization can perform some level of analysis on their own, making it easier to surface information that informs business decisions.

Reacting to changes without a hitch

Because automation software is built to recognize icons, images, symbols, and other objects regardless of their position on a screen, it can automate processes in a self-perpetuating manner. For example, let’s say you automate data retrieval from a certain chart on a retailer’s website without automation software. If the retailer decides to move that object to another area of the screen, your task would no longer produce accurate results (or work at all), leaving you to make changes to the script (or find someone who can), or re-record the task altogether. With image recognition capabilities, however, the system “memorizes” the object itself, not merely its coordinates, so that the task can continue to run irrespective of changes.

The wide sweeping appeal of automation software

Companies often pick a comprehensive automation solution not only because of its ability to effectively automate any web data extraction task, but also because it goes beyond data extraction. Automation software can permeate into other areas of the business as well, making tasks such as application integration, data migration, IT processes, Excel automation, testing, and routine tasks such as launching applications or formatting files faster and more accurate. Because it requires no programming experience to use, adoption rates are higher and businesses get more “bang for their buck.”

Almost any organization can benefit from using automation software, particularly as they grow and scale. If you are looking to quit “moving grains of sand” and start claiming back time in your day, there are a few steps you can take:

 Watch a short video that shows how web data extraction is done with automation software

 Download a free trial and start reaping the benefits of downloading even just a couple of tasks today.

 See how tasks are automated with our short, step-by-step how-to-sheets (and then give it a try yourself!)

Source: https://www.automationanywhere.com/web-data-extraction

Tuesday, 25 November 2014

Outsourcing Data Mining is a Wise Business Decision

Most businesses nowadays have a large volume of raw data that is never processed, because of the lack of time or resources. If your business is facing a similar situation, then you are missing out on valuable information. Without the right information, your company will be unable to make accurate business decisions.

The right information can play a key role in promoting the growth of your business. When unprocessed data is entered, filtered, classified and converted into a workable format, it can be used to maximize your profits, ameliorate your risks and run a seamless workflow.

Over the years, data mining has proved to be extremely useful in various industries, be it, healthcare, direct marketing, e-commerce, finance, customer relationship management or telecommunications. With the right information, companies have been able to make fast and effective business decisions.

Why outsource data mining?

Data mining requires the expertise of professional business and financial analysts who understand how to acquire important information from vast amounts of data. If data mining is done in-house, it can become expensive and time consuming. It can also shift your focus away from core business activities. Outsourcing data mining on the other hand is more fast, cost-effective and can give you access to professional services.

4 commonly outsourced data mining functions

Most companies outsource one or more of the following data mining functions to India:

1. Data congregation: Data is extracted from various web pages and websites, by using methods like web and screen scraping. The collected data is then entered into a database.

2. Contact data collection: Different websites are searched and information concerning contacts is collected.

3. E-commerce data: Data about varied online stores are collected, taking into account information about prices, discounts and products.

4. Data about competitors: Data about business competitors are collected to help a company gauge itself against its competition. With such valuable data, you can effectively re-design your marketing strategy and pricing matrix.

8 advantages of outsourcing data mining to India

With data mining out of your hands, your business can make huge savings in terms of time, money and infrastructure. The following are some of the benefits that you can leverage by outsourcing data mining to India:

    Get qualified and highly skilled data mining experts to work for you at an extremely affordable cost

    Be assured of the quality of information, as Indian data entry companies only extract information from reliable websites and databases

    Save on the cost of investing on the latest data mining software and technology, as your Indian service provider will be making these investments

    Get your data processed within a short turnaround time of 3,6 or 12 hours as Indian data mining companies can provide efficient data mining within a few hours

    When compared to in-house data mining, outsourcing data mining can be a lot cheaper and also bring you better results

    Stay assured about the complete privacy, security and confidentiality of your valuable data as Indian data mining companies use the latest technology to ensure 100% safety

    Get access to data with a wide market coverage as your Indian data mining provider will be serving many business with varied data mining needs

    Improve your overall productivity and generate more profits by making informed decisions about your business

Have you outsourced data mining before? If yes, which data mining service did you outsource? Did you find outsourcing more advantageous that in-house data mining. Let us know.

Source: http://blog.flatworldsolutions.com/outsourcing-data-mining-is-a-wise-business-decision/