Getting started with web
automation in C#
      LEARNING BY EXAMPLES
Summary
You can find my website at - http://foggymountainsolutions.com/
Or my email - support@foggymountainsolutions.com
This e-book has been created with the intention of teaching different methods of carrying
out web automation using C# and the .NET framework. It will show readers how to apply
these methods specifically by using Visual Studio and common automation libraries like
Chromedriver and Selenium.
The book also demonstrate the use of related methodology like web crawling and web
scraping which can be relevant from a website analytic perspective.
The two techniques complement one another and can together become a powerful tool
creating many opportunities, provides knowledge on how things run behind the curtains
as well as how to use this knowledge to reach a users specific needs and goals.
Common usage
Automating social media (follow/unfollow, liking/ sharing).
Providing statistics not available otherwise (referencing correlations in Facebook groups
etc).
Scraping information on websites (images, product information, emails etc)
                                                                                    PAGE 1
Contents
Summary ................................................................................................................................... 1
Getting started .......................................................................................................................... 3
   It all starts with an idea......................................................................................................... 3
   Concepts of web automation and scraping the web ............................................................4
   Visual studio, c# and the .NET framework ..........................................................................4
       Chromedriver ....................................................................................................................4
       Selenium ............................................................................................................................ 5
       HtmlAgilityPack ................................................................................................................ 5
   Getting your tools ready ....................................................................................................... 5
       Step one ............................................................................................................................. 5
       Step 2..................................................................................................................................6
       Step 3 ..................................................................................................................................6
   Dealing with x-paths ............................................................................................................. 7
How to make a program for web automation ........................................................................ 11
   Automation application ....................................................................................................... 11
       Step 1 ................................................................................................................................. 11
       Step 2................................................................................................................................. 11
       Step 3 .................................................................................................................................12
       Step 4 ............................................................................................................................... 14
       Step 5.................................................................................................................................15
Code ......................................................................................................................................... 18
                                                                                                                                     PAGE 2
Getting started
IT ALL STARTS WITH AN IDEA
As many people will tell you, great programs are created from great ideas. Knowing how to
get there is the last thing to worry about. The other major part to succeed in programming
is motivation. With a great idea and an actual deep interest in what is going to be created
is an important mindset which is needed in order to not giving up and deliver quality
programs.
      Do you want to crate bots?
      Do you want to scrape information?
I believe strongly in learning by examples. Whenever someone asks me where to start with
programming, I always tell them that they should find an area of interest, whether it’s web
development or building bots and start with something small. Keep it interesting and
easy. This will get you to the gratifying feeling of having created something which can
then propale you further on your programming journey.
                             Figur 1 - flowchart of taking action
                                                                                   PAGE 3
CONCEPTS OF WEB AUTOMATION AND SCRAPING THE WEB
Why do we automate things?
It generates very tangible benefits with the most obvious one being saving time. Saving
time in turn brings other gains like saving money and resources. From a users point of
view, they often don’t know or care how a program work. This is in essence why these
kind of services exists and we do what we do.
VISUAL STUDIO, C# AND THE .NET FRAMEWORK
.NET Framework (pronounced dot net) is a software framework developed
by Microsoft that runs primarily on Microsoft Windows. It includes a large class
library named Framework Class Library (FCL) and provides language
interoperability (each language can use code written in other languages) across
several programming languages. Programs written for .NET Framework execute in
a software environment (in contrast to a hardware environment) named Common
Language Runtime (CLR), an application virtual machine that provides services such as
security, memory management, and exception handling. (As such, computer code written
using .NET Framework is called "managed code".) FCL and CLR together constitute .NET
Framework.
Libraries of interest
      Chromedriver (Handling Javascript and headless browsing)
      Selenium (Website interaction)
      HtmlAgilityPack (Parsing HTML DOM elements)
Chromedriver
We use Chromedriver for two purposes. Ones is to provide a way for us to load dynamic
content such as Javascript. This could be images who loads after you scroll down a web
page which won’t exist until a certain criteria has been met. The website simple need to be
emulated for this to function properly. If you don’t have a way of dealing with this
dynamic content. You won’t be able to do much of anything.
This leads us to the other purpose of using Chromedriver which is headless browsing.
What it means is that the emulated web browser won’t have a visible window that pops up
when the program is running, which is something you normally don’t want users to see.
There are also other nice benefits to Chromedriver such as implementation of proxies and
useragents.
                                                                                   PAGE 4
Selenium
Selenium is our second tool for web browser automation. It is used in conjunction with
PhantomJS and will allow us to do things like filling out a form, press buttons and other
similar types of website interactions.
HtmlAgilityPack
This is our library used for HTML parsing. It’s not necessarily used in all types of
automation programs. But it can be used for data-gathering purposes.
GETTING YOUR TOOLS READY
In this portion you will learn how to download and install Visual Studio and how to start
your first project.
Visual Studio can be downloaded here. It’s free for any community users. Install
instructions can be found here.
Now you will learn how to create your first winforms project.
Step one
Go to File -> New -> Project
                                                                                       PAGE 5
Step 2
Select Windows Forms Application and give it the name FirstWinformsApplication
and click ok.
Step 3
Next is installing our needed libraries. To this, right click your solution and select
“Manage NuGet Packages For Solution..”
In the Nuget Manager, go to “browse” and search for HtmlAgilityPack and click install.
                                                                                         PAGE 6
Repeat this step for all the following libraries:
       Selenium.WebDriver.ChromeDriver
       Selenium.Support
       Selenium.WebDriver
DEALING WITH X-PATHS
Xpath uses path expressions to select nodes or node-sets in an XML document.
For this guide we are using x-paths to parse and navigate the websites. It is something that
will tell our program where to find what we are looking for such as text fields, buttons or
texts.
The following will be an exercise in how they work.
First you need to get an addon for your browser that will help you test xpaths. For Chrome
you can use this and for Firefox you can use this. This guide will be using Chrome, but it’s
similar for Firefox. I will use this website in this example. It is a page that contains ~100
images that we will find using x-paths.
Right-click on an image and click “Inspect”. This will open Google Developer Tools in a
bar to the right or the bottom.
                                                                                     PAGE 7
You should now be seeing a sidebar with the URL to the image visible.
Now right-click the link and select “Copy” and then “Copy XPath”. The Xpath to the image
should now be saved to your clipboard.
                                                                                PAGE 8
Open your Xpath-addon.
Paste the Xpath into the left part of the box that poped up. If the path doesn’t end with
“//img/@src”, add it. It should now look like this. With the URL to the image showing up
in the right part of the box. We now have a working Xpath.
Here things get a bit more complicated. But once you have done it a few times things will
clear up.
We want all the images on the page. To do this we modify our Xpath a bit. What we want
to do is to identify a top level element or “container” in which all of the images are
located. You can do this by moving your mouse cursor inside the Developer Tools and
look for a place where all the images appears to get a blue overlay.
Here I have found a spot that seems to get what we want. Now we repeat step 5-7.
Remember to add “//img/@src”
                                                                                   PAGE 9
This is the result you should now be seeing. We now have an Xpath which the program
can use to scrape the images.
Now we are done and ready to start making our web automation applications!
                                                                              PAGE 10
How to make a program for web automation
In this chapter three types of common application will be presented. The first is for
automation (login), the second is for web scraping (text) and the third an example of how
to create useful statistics from scraped information. You will find the complete source
code under the “Code” chapter.
AUTOMATION APPLICATION
Now we can go back to Visual Studio and our project. The automation process will consist
of filling out a form of a website.
This is the website we will try this out on - Website
Step 1
Back in our project. Let’s add some controls that we need.
Add:
        1 Button
        4 Textbox
        2 Radiobutton
        1 Richtextbox
        1 Backgroundworker
It will end up looking something like this.
Step 2
Now go ahead and double click the button on our form. This will take you to the buttons
click event and the code we want to work with. The purpose of the button is simply to
                                                                                  PAGE 11
start our worker thread (backgroundworker). When you are writing code that does heavy
work you need to use this. Otherwise running code on the UI-thread will make the
program unresponsive and “laggy”.
To start our backgroundworker we add this code:
     if (!backgroundWorker1.IsBusy)
        {
             backgroundWorker1.RunWorkerAsync(2000);
        }
The rest of our code will be placed in our backgroundworkers DoWork event. Something
we need to take care of first is that running a backgroundworker more than once will
crash the program. This code will make sure that we don’t accidently start it twice by
encapsulating the code with an if-condition which checks if the backgroundworker is
already running.
Step 3
Now we need to do a bit of research and analyzing of the website. What we want to find is:
                                                                                 PAGE 12
      Xpaths of textboxes
      Xpaths of the button that sends the information in the form
By using the method of finding xpaths in the previous chapter. You should end up with
these ones.
//*[@id="name"] //*[@id="city"] //*[@id="phone"] //*[@id="email"]
//*[@id="ff"]/input
                                                                                PAGE 13
Step 4
Now we go back to programming. Double click the backgroundworker to get to the
DoWork event (this is where our main code will be written and executed). We start by
initializing and setting up our webdriver (ChromeDriver).
Since we want our emulated browser to not be visible as well as the command promp
widow. We create a ChromeDriverService and a ChromeOption.
var service = ChromeDriverService.CreateDefaultService();
ChromeOptions option = new ChromeOptions();
Next, we add the commands to the service and option.
service.HideCommandPromptWindow = true;
option.AddArgument("--headless");
Now we can initialize the driver.
ChromeDriver driver = new ChromeDriver(service, option);
When interacting with a website we need a way of determining whether or not certain
website elements exists or don’t exists before using them. We can do this by implementing
an implicit wait function. This code will tell the program how to deal with .ex long loading
times and so on. If an expected element is missing the program will try to find it for (in
this example) 60 seconds before throwing an exception.
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(60);
Then, we have the program navigate to the website using our driver.
driver.Navigate().GoToUrl("https://demo.ftutorials.com/html5-contact-form/");
Now we can start the interaction with the form and fill out the textfields. We do this by
creating a query and inserting our xpaths.
                                                                                   PAGE 14
This is for our text fields. The input is taken from our programs textboxes.
IWebElement query = driver.FindElement(By.XPath("//*[@id='name']"));
query.SendKeys(textBox1.Text);
IWebElement query2 = driver.FindElement(By.XPath("//*[@id='city']"));
Query2.SendKeys(textBox2.Text);
IWebElement query3 = driver.FindElement(By.XPath("//*[@id='phone']"));
Query3.SendKeys(textBox3.Text);
IWebElement query4 = driver.FindElement(By.XPath("//*[@id='email']"));
Query4.SendKeys(textBox4.Text);
To round things up we need to click the websites “submit button”.
driver.FindElement(By.XPath("//*[@id='ff']/input")).Click();
Step 5
That sums up the automation part. After this, for this example we want a way to make
sure that everything worked out as intended. To do this we are going to scrape some
information on the web page which we are automatically directed to after the form has
been submitted (https://demo.ftutorials.com/html5-contact-form/success.html).
Theoretically the program should not have arrived at that page. We are now going to save
all the HTML or the page. To do this we first declare and initiate a string to hold this
HTML. You can do this by using the “PageSource” method.
string renderedContent = driver.PageSource;
Now that we have the HTML saved it’s time to bring out our HTML parser
(HtmlAgilityPack). This is done in a few steps whit the first one being declaring
HtmlDocument to translate our string into something that HtmlAgilityPack can
understand and work with.
HtmlAgilityPack.HtmlDocument doc1 = new HtmlAgilityPack.HtmlDocument();
                                                                                    PAGE 15
Next, we actually apply the instructions above.
doc1.LoadHtml(renderedContent);
We now declare a HtmlNodeCollection. This is an array of html nodes which are retrieved
using an Xpath. This means for us that the array will only contain one node namely this
text on the website.
But in other cases (like when scraping images) the Xpath can pick up more than one
result. You can test this yourself by going to Google images and trying the Xpath “//img”.
You will see more result than one. When this happens you can create a loop to go through
the array and use the result however you see fit (more on this later). This is the Xpath to
our page and the success message above.
//*[@id='contact_form']/h1/text()
And then the code.
HtmlNodeCollection resultCollection =
doc1.DocumentNode.SelectNodes("//*[@id='contact_form']/h1");
As mentioned above, we will only have one result in our nodecollection. But we still create
a loop to retrieve it since it’s common practice.
Before starting the loop you need to make sure that the nodecollection isn’t empty (null).
This is done with an “if” condition. If the given condition isn’t “true”, then the program
will skip this step otherwise it will write “Node is empty” in our richtextbox. It’s always
better to handle potential errors like this rather than dealing with exceptions.
If the nodecollection isn’t empty we can be sure that our loop is ready to go. To get our
result we use a “forloop” to work the array.
                                                                                    PAGE 16
The result needs a bit of work when writing it to the richtextbox since the “result” in our
case is still of the type “HtmlNode” and we want it to be readable text to display in our
richtextbox. We can do this by adding “InnerText” to our result node.
   if (resultCollection != null)
       {
             foreach (var result in resultCollection)
             {
                   Invoke(new Action(() => richTextBox1.Text = result.InnerText));
             }
       }
       else if (resultCollection == null)
       {
             Invoke(new Action(() => richTextBox1.Text = "Node is empty"));
       }
And now we are done! Run the program and test it for yourself.
                                                                                     PAGE 17
Code
using   HtmlAgilityPack;
using   OpenQA.Selenium;
using   OpenQA.Selenium.Chrome;
using   OpenQA.Selenium.PhantomJS;
using   OpenQA.Selenium.Remote;
using   OpenQA.Selenium.Support.UI;
using   System;
using   System.Collections.Generic;
using   System.ComponentModel;
using   System.Data;
using   System.Drawing;
using   System.Linq;
using   System.Text;
using   System.Threading;
using   System.Threading.Tasks;
using   System.Windows.Forms;
namespace FirstWinformsApplication
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
          private void button1_Click(object sender, EventArgs e)
          {
              if (!backgroundWorker1.IsBusy)
              {
                  backgroundWorker1.RunWorkerAsync(2000);
              }
          }
          private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
          {
              var service = ChromeDriverService.CreateDefaultService();
              ChromeOptions option = new ChromeOptions();
              service.HideCommandPromptWindow = true;
              option.AddArgument("--headless");
              ChromeDriver driver = new ChromeDriver(service, option);
              driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(60);
                                                                             PAGE 18
           driver.Navigate().GoToUrl("https://demo.ftutorials.com/html5-contact-
form/");
           IWebElement query = driver.FindElement(By.XPath("//*[@id='name']"));
           query.SendKeys(textBox1.Text);
           IWebElement query2 = driver.FindElement(By.XPath("//*[@id='city']"));
           query2.SendKeys(textBox2.Text);
           IWebElement query3 = driver.FindElement(By.XPath("//*[@id='phone']"));
           query3.SendKeys(textBox3.Text);
           IWebElement query4 = driver.FindElement(By.XPath("//*[@id='email']"));
           query4.SendKeys(textBox4.Text);
           driver.FindElement(By.XPath("//*[@id='ff']/input")).Click();
           string renderedContent = driver.PageSource;
            HtmlAgilityPack.HtmlDocument doc1 = new
HtmlAgilityPack.HtmlDocument();
            doc1.LoadHtml(renderedContent);
            HtmlNodeCollection resultCollection =
doc1.DocumentNode.SelectNodes("//*[@id='contact_form']/h1");
            if (resultCollection != null)
            {
                foreach (var result in resultCollection)
                {
                    Invoke(new Action(() => richTextBox1.Text =
result.InnerText));
                }
            }
            else if (resultCollection == null)
            {
                Invoke(new Action(() => richTextBox1.Text = "Node is empty"));
            }
        }
    }
}
                                                                           PAGE 19
PAGE 20