We've Moved to GitHub

THIS PROJECT HAS MOVED TO GITHUB - PLEASE UPDATE YOUR LINKS
New Address: http://github.com/axefrog/XBrowser

XBrowser is a "headless" web browser written for .Net applications using C#. It is designed to allow automated, remote controlled and test-based browsing sessions without any actual rendering overhead.

Whilst this is a new codeplex project, it is based on what I learned developing an older version which was actually used privately throughout several commercial projects. That version is included in the source code to be used while XBrowser is under development. It has a good amount of maturity and debugging already behind it, so should server as a solid fill-in until XBrowser is ready for practical use. Note that the legacy version (XHtmlHttpBrowser) has no support for JavaScript; it is purely and simply HTML-based with session/cookies persistence, GET and POST operations, HTML parsing and an internal LINQ-to-XML-based document querying system.

The XBrowser project is being built on the .Net 4 framework, now that it has been officially released by Microsoft. There are no plans to support .Net 3.5 or earlier.

Requirements

Checking out the source for this project also requires a checkout of the source for the project axefrogcore. Both projects should be checked out side by side in the same parent directory. e.g.

/XBrowser-Project
/XBrowser-Project/axefrogcore
/XBrowser-Project/xbrowser

The dependency on axefrogcore will hopefully be removed in a future release once XBrowser completely replaces the functionality of the legacy XHtmlHttpBrowser class that is also included in the source code.

Current Status (27-Apr-2010)

  • The HTML5 W3C specification is being used as the basis for the browser. Noncompliant pages are "upgraded" to HTML5 during parsing as needed.
  • A new highly-permissive parser is partially complete that does a more correct job of processing broken HTML than either HtmlTidy or HtmlAgilityPack. In the initial version, DTD's are discarded as there are very few use cases I can think of for this type of browser where it would be advantageous to preserve legacy HTML functionality.
  • I've created the full skeleton of HTML5 Element classes and am currently working on filling out the code surrounding loading the elements of a page. Pages are parsed completely (previously using HtmlAgilityPack, though shortly being replaced by an internal parser) and mapped side-by-side with both an XDocument structure and a native structure that uses XBrowser's internal set of HTML-specific elements, which will be complete with their own context-relevant methods, events and properties. A document structure can be navigated using LINQ-to-XML and most individual XElements are annotated with the XBrowserElement instance they are mapped to. Likewise, any XBrowserElement exposes the XElement object it is mapped to.
  • A jQuery (SizzleCS) compatible selector engine is partially complete for querying the elements of a page from .Net code, which will make it very easy to locate specific elements in a page for manipulation. Queries such as body > ul:first > li[myclass~=test] will be possible.
  • The browser is aware of multiple "windows" that it can create and can thus pretend to be a tabbed browser and/or a browser with popup windows. Each window will have knowledge of the central cookie store and cache for the parent XBrowser instance. The window design is also such that it will cater for contained windows such as in framesets and iframes, when they are implemented.
  • The browser can now navigate asynchronously to a web address and properly handles temporary and permanent redirects. Events exist to hook completion of the navigation (or to detect if an exception is thrown). Secondary external resources such as scripts, images and so forth are not yet loaded.
  • The cookie store can now parse cookie headers completely, which is half the job. The other half is to write out cookies for a given request and also to maintain the store throughout a browser session. Later I'll implement an option to persist the store for future browser sessions.

NOTE: The Actual XBrowser class is not yet complete enough to be used for anything practical. In the mean time, use the legacy XHtmlHttpBrowser class which does all the basics without JavaScript support.

To Do List

  • Create a custom cookie management implementation to replace use of CookieContainer, which is horribly, horribly broken - Partially Complete
  • Replace HtmlAgilityPack with a better HTML parser to overcome issues with displaced form elements - In Progress
  • Integrate the JINT project to begin basic javascript support, or preferably, if complete enough, the IronJs project, which looks to have much better performance than JINT.
  • Implement JavaScript hooks for HTML elements as per W3C recommendations - In Progress
  • Create a list of tests that the browser must pass on simple, intermediate and complex websites - In Progress As Needed

Last edited Aug 2, 2010 at 2:29 PM by NathanRidley, version 26