Analysing the risk factor of buggy SSL validation system

 

Introduction

SSL (Secure Sockets Layer) is the most standard security technology used in internet communications between a server and a client. A client can be a browser based website or non-browser based client application. Based on the type of application, SSL enables a secured transmission of credit-card information, user personal information or login credentials and such data across network. SSL certificates are needed to create the secured connection between the parties [1].

How SSL Works

After a TCP connection, SSL handshake is initiated by the client. The client sends different specifications like SSL version, Cipher suites to be used, compression methods to be used etc. The server picks the highest SSL version supported by both the parties and optionally picks a cipher suite and compression method supported by the client.

 

After this, the server sends its certificate. This certificate has to be trusted by the client or by some third party trusted authority. After the certificate of the server is verified correctly, a key is exchanged. This key may be a public key based on the cipher suite selected. Now client and server computes the key for symmetric encryption. Followed by this, the client tells the server that further communication would be encrypted and then sends encrypted authentication message to the server.

 

Now the server verifies that the Message Authentication Code (MAC) is correct to decrypt the message. It then sends a message which is verified by the client.

 

Now the handshake is finished and the client can communicate to the server in a secured way. To close the connection after communication both the parties uses close_notify alert. During this time when an attacker tries to terminate the connection by injecting FIN bit, both client and server knows that the connection was improperly tear down. This just interrupts the secured connection for a while but does not compromise the security of the connection [2, 3].

 

Thus for a proper SSL connection, validation of computer hardware and software is required. Hence security experts has overcome with a concept called chain of trust for this verification purpose [4].

 

Essential parts of SSL validation

Chain of trust verification:

Each certificate must be issued by a trusted Certificate Authority (CA). Even if the issuer is not a trusted CA it should be issued by some other authorities which are chained all the way down to a trusted CA.

 

Hostname verification:

SubjectAltName (SAN) is an extension in the X.509 certificates which carries various host information like host-name, email etc.

According to RFC 2818 while checking the hostname one should check “SubjectAltName” primarily. One can check the “Common Name” field also but that has been deprecated and has been checked still for backward compatibility. But most of the SSL implementation checks it the other way round – which introduce bugs regarding host name verification.

 

CRL (Certificate revocation list) check and X.509 extension check:

Most application do not care about CRL checking. Different SSL library has different approach of doing this checking. For example OpenSSL implement the certificate revocation feature but the application need to supply the certificate revocation list to it which many applications ignore or forget. Other libraries like JSSE require the application to check the revocation list by its own and most application don’t bother about this.

Here we are going to place some example of some libraries and tools where SSL validation is done in a wrong way or has not been done at all.

SSL Libraries

OpenSSL.:
OpenSSL is one of the most used libraries for handling certificates and manage them. OpenSSL does not provide any host name verification itself; and this is logical because different application has different notions of hostname verification, for example some use a name, some use an IP or some use an email as verifying hostname. As openSSL does not provide a hard and fast rule for hostname verification most application misses to verify the hostname during implementation of this library.

Again by default openSSL does not provide any run-time exception for self-signed certificate or invalid chain of trust. SSL_connect is a function in openSSL to establish a secure connection. For some ssl validation error this function returns false but yet for some validation error this function returns true, but sets some flag. It is the applications responsibility to check those flags and complete the validation.

 

Example:

Trillian is a popular internet messenger client uses openSSL to establish secure connection. To get a secure connection it calls the SSL_connect function and based on the return value it establish the connection. It never calls the SSL_get_verify_result to verify other verification flags like SSL_VERIFY_PEER. Thus Trillian accepts the certificate that are not properly verified and vulnerable to the man-in-the-middle attack.

 

JSSE:

Java secure socket extension (JSSE) is a library that provides many interfaces through which all the java applications stablish SSL connection. This java applications includes android mobile apps also. A low level API is SSLSocketFactory which perform or may not perform host name verification depending on the implementation.

 

 


private void checkTrusted(X509Certificate[] chain, String authType, Socket socket, boolean isClient)
throws CertificateException {
...
// check endpoint identity
String identityAlg = sslSocket.getSSLParameters().
getEndpointIdentificationAlgorithm();

if (identityAlg != null && identityAlg.length != 0)
   {
      String hostname = session.getPeerHost();
      checkIdentity(hostname,chain[0], identityAlg);
   }
}

 

SSL clients that use raw SSLSocketFactory gets the algorithm field (identityAlg) null in the above checktrusted () function thus not performing hostname verification. The job of hostname verification is delegated to the application running on top of JSSE. This feature is not explain in the JSSE API documentation.

 

Example of bugs in SSL for non-browser software

Many bugs in SSL for non-browser software in the areas of Merchant SDKs, Cloud Services, Mobile Applications, Middleware Web services. [5]

Merchant SDKs: Amazon Flexible Payments Service

According to paper [5], The Amazon FPS provides SDK that server can include to carry the customers payment details to payment gateways.Within FPS part of establishing an SSL connection using libcurl library, there is a misunderstanding in setting cURL options to validate certification as follows,


curl_setopt($curlHandle, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlHandle, CURLOPT_SSL_VERIFYHOST, true);
...
//Execute the request
$response = curl_exec($curlHandle);

The default value for CURLOPT_SSL_VERIFYHOST is 2, but when the value is passed true as above, it in turn sets to value to 1. This value means validating against any common name that may or may not match the requested name and that was not the aim. Any code uses this SDK for service will be vulnerable for man-in-the-middle attack.

Cloud Services: Rackspace

Rackspace cloud services uses ASIHTTPRequest library (via OpenStack iOS cloud framework) to set secure connection.

 

This library provides a configuration variable ValidatesSecureCertificate, set to 1 by default. The value 0 means, it turns off, Chain-of-trust and Hostname verification. The Rackspace App does not present the user with option to switch validateSSLSwitch. Instead, it initializes the value to 0 by objective-c allocator. This leads to turning on ignoreSSLValidation in ASIHTTPRequest, which in return set the variable ValidatesSecureCertificate to 0, causing the disable of certificate validation.[5]

 

This, paper comes to conclusion that SSL connections established by the Rackspace

apps on iOS are insecure against a man-in-the-middle attack.

Middleware Web Services:
Apache Axis, Axis 2, Codehaus XFire

As described in paper [5], Apache Axis, Axis 2 and XFire applications rely on SSLSocketFactory to establish SSL connections (as they use HTTP Client).

 

SSLSocketFactory does not verify host name verification (fixed later in HTTP client 4).

 

This lead to vulnerability of validating only chain of trust without verifying host name.

Mobile Apps: APPLE mobile

As described in paper [6], Apple had a security flaw in their ssl certificates validation, which was due to containing an extra goto fail; sentence. This makes a skip in actual SSL validation. Refer the code snippet from Apple open source library[6] shares below,


SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,uint8_t *signature, UInt16 signatureLen)
{    	......   
......
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
       goto fail;
   	goto fail;
......
	err = sslRawVerify(ctx,
                      ctx->peerPubKey,
                      dataToSign,				/* plaintext */
                      dataToSignLen,			/*plaintext length*/
                      signature,
                      signatureLen);
   fail:
   SSLFreeBuffer(&signedHashes);

   SSLFreeBuffer(&hashCtx);

   return err;
}
.....
.....	

 

Recommendations and tips to avoid bugs

For application developers

  1. Developers need to verify the default settings of SSL libraries. In some cases the default values might disable certificate validation.
  2. When adopting and extending standard  SSL libraries, care needs to be taken by apps to verify that, all relevant steps of validation of certifications are followed
  3. Testing of software for SSL validation of certificates needs to include steps for testing Chain of trust, host name check, signature check and certificate expiry.
  4. Black-box and adversarial testing of software should include other wrong certificates to test validation of certifications for application apart from intended set of correct certificates.
  5. Developers should be aware of outdated libraries. SSLv3 is vulnerable to the POODLE Attack. During SSL handshake attacker can force client to use sslv3 or TLS 1.0 to exploit this vulnerability. For this reason, ebay has disabled the support of sslv3. [7]
  6. Developers, testers should not turn off the ssl validation during development or testing phase which could leave the code untested, thus leading to security vulnerabilities.

For SSL library developers

  1. SSL libraries need to be more explicit about the semantics of their APIs. It should have good javadoc with explanations on different options, parameters and some possible values along with some examples of use.
  2. Responsibility of managing SSL connections should be abstracted from application layer.
  3. SSL Libraries should set safer default values for options and parameters that ensures secure SSL connections.
  4. Runtime exceptions should be raised as and when important steps of validation are passed with NULL or Empty instead silently skipping the validation.

 

 

Conclusion:

A good design of APIs for SSL is important in ensuring that they are not misinterpreted during use. While designing API one should take to consideration all essential and critical parts of SSL connection. There should not be a room for overriding this critical part of security connection.

 

It is dangerous to rely on standard libraries for secured SSL connections without verifying the APIs for its options, parameters and variable values.

 

Man-in-the-middle attack should be part of testing strategy for non-browser based apps.

Co-Authors

  • Priti Swami
  • Karthik G Venkatesan

References:

 

XHTML Well-formedness Validation with Prolog

Here I am presenting a prolog program that will check the well-formedness of an XHTML document. The term XHTML well-formedness describes an XHTML document where all the texts follows all the syntactic rules labeled as well-formedness rules in the XHTML specification.

 

Detailed Working Process

My prolog program will read input from a file xhtml_nodes.txt which will contain a plan list (not a prolog list) of prolog terms. Then it will read those terms one by one and put them in a prolog list. Now I have an XHTML document in a prolog list – in terms of translation between html elements and prolog terms.

Now what it needs to run the validator is to compile the file main.pl and run it with the following commands

?- [main].

?- main.

 

The file main.pl actually contains a predicate ensure_loaded/1 which will load 4 other files as readFile.pl, ncount.pl, dcg_rules.pl, dcg_lexicon.pl.

 

:- ensure_loaded([readFile, ncount, dcg_rules, dcg_lexicon]).

 

They will work as their name implies

readFile.pl will read the xhtml_nodes.txt and return a list of nodes.

Ncount.pl will just count the number of elements of the list.

dcg_lexicon.pl contains the lexicon list that are needed in the DCG rules. As for information lexicons are not generated for all the elements of XHTML.

All the elements are grouped according to their behavior. The covered elements in this project are listed in Appendix 1.

dcg_rules.pl is the heart of the program where all the validation rules for XHTML is stored according to the XHTML specification. These rules are DCG rules in Prolog to validate the list of nodes. These list of node is actually representing the XHTML element ordering as in the input file. As for information, not all the rules of the XHTML specification is implemented.

 

The rules that I have covered are as bellow

  1. A XHTML document should be constructed with html, head and body element node.
  2. The root element of the document must be html.
  3. An html element must have a body and a head
  4. A document must have a title in the head.
  5. XHTML elements must be closed.
  6. XHTML elements must be properly nested.
  7. Empty elements must be terminated.
  8. Body element must contain a block element or a series of block element or can be empty.
  9. A block element can contain a series of block elements or a series of inline element or can be empty.
  10. Tables (special type of block element) should only use table, tr and td elements, optionally caption.
  11. An inline element can contain a series of inline element or can be empty but cannot contain a block element.
  12. Some special inline element cannot contain other inline elements (input, textarea, br)
  13. Select element (special type of inline element) shall contain N* options.
  14. Anchor (a) element can have certain elements like img, strong, span, i, em, b, caption and label. Among then some are self-contained inline and some are container inline element.

 

The implementation of these rules has been commented in the prolog program code. All the codes are under Appendix 2.

 

There is also a C#.NET program that will read a raw XHTML document (i.e. .html, .htm) and then will help to prepare input for the prolog program. The input for the prolog program is a file that contains a plain list of prolog terms.

For example the following XHTML chunk will be transformed as bellow.

 

<html lang=”en-US” xmlns=”http://www.w3.org/1999/xhtml”&gt;

<head>

<title>HTML Tutorial</title>

<link rel=”stylesheet” type=”text/css” href=”./sample1_files/stdtheme.css”>

</head>

<body>

</body>

</html>

 

The translated prolog terms

html_start.

head_start.

title_start.

title_stop.

link_start.

link_stop.

head_stop.

body_start.

body_stop.

html_stop.

 

There are normally two types of terms-

  • Opening term (xxx_start.) will define a start of an element.
  • Closing term (xxx_stop.) will define a close of an element.

 

The UI of the program is as fig. 1. Browse an .html or .htm file with the file button. Then click read button to produce the output-normally shown in the textbox and list below the read button. But the main output which is a text file called xhtml_nodes.txt will be saved in your prolog directory (if the “Prolog” directory is under “My Document” i.e. C:\Users\rizvis\Documents\Prolog\xhtml_nodes.txt).

Figure_1

 

 

How to run

To start with a test pick a XHTML file or a normal HTML file will also work. Start the C#.NET application and generate a xhtml_nodes.txt file. The input and output file path will be show in the application. If the xhtml_nodes.txt file is not in your prolog directory move it there. Now compile the mail.pl file and run main predicate (?- main.). Make sure all the other supporting files (readFile.pl, ncount.pl, dcg_rules.pl, dcg_lexicon.pl) are there along with main.pl. You will get a text saying “Valid Document” and a true return if the document is valid otherwise false return.

Figure_2

 

Conclusion

There are so many rules as of XHTML specification that it is quite hard to implement all of them in this project which has only one member. But I have tried my best to cover up all the major rules that dictate XHTML structure and well-formedness.

 

References

Tutorial by Paul Brna – http://homepages.inf.ed.ac.uk/pbrna/prologbook/index.html

Tutorial Learn Prolog Now – http://www.learnprolognow.org/lpnpage.php?pageid=top

Sources of validation rules are W3C organization – http://www.w3.org/TR/xhtml2/mod-document.html

XHTML 1 and 2 specification – http://www.w3.org/TR/xhtml1/ and http://www.w3.org/TR/xhtml2/

List of Block Elements – http://www.cs.sfu.ca/CourseCentral/165/sbrown1/wdgxhtml10/block.html

List of Inline element – http://www.cs.sfu.ca/CourseCentral/165/sbrown1/wdgxhtml10/inline.html

 

 

 

 

Appendix 1

All the elements that has been covered in the project

Block level Elements

  • div – Generic block-level container
  • h1 – Level-one heading
  • h2 – Level-two heading
  • h3 – Level-three heading
  • h4 – Level-four heading
  • h5 – Level-five heading
  • h6 – Level-six heading
  • hr – Horizontal rule
  • p – Paragraph
  • pre – Preformatted text

 

Special block level element

 

Inline elements

  • b – Bold text
  • code – Computer code
  • em – Emphasis
  • i – Italic text
  • span – Generic inline container
  • strong – Strong emphasis
  • Caption – place holder a for a caption text

 

Self-inline

  • img – Inline image
  • textarea – Multi-line text input
  • input – Form input
  • br – Line break
  • Option – An option in the select element
  • label – Form field label
  • a – Anchor

Special Inline –

select – Option selector

 

 

 

 

Appendix 2

Prolog program code

File main.pl

% Main file to start the program.

% Coded by Rizvi Hasan

% Date 20141013

 

%Loadthepredicatesofotherfiles

:- ensure_loaded([readFile, ncount, dcg_rules, dcg_lexicon]).

 

% Run the XHTML test in a line of code

main:- readFile(‘xhtml_nodes.txt’,Y),nl,nl,write(Y),nl,nl,ncount(Y),doc(Y,[]),nl,nl,write(‘Valid Document’),nl,nl,!.

 

% Frequently used commands

% [main]. main.

% [dcg_rules, main]. main.

 

 

File readFile.pl

% For reading a file of a list of terms

% Coded by Rizvi Hasan

% Date 20141013

% readFile(+,-).  example: readFile(‘xhtml_nodes.txt’,Y),nl,write(Y),nl,nl

 

 

% open a file in reading mood.

readFile(F,Out):- open(F, read, Strm),

reading(Strm,Out).

 

% Read from the stream and store in a list.

reading(Strm,Out):- reading1(Strm,[],Out1),reverse(Out1,Out).

 

reading1(_,[end_of_file|Acc],Acc).

reading1(Strm,Acc,Out):-

read(Strm,X1),

reading1(Strm,[X1|Acc],Out),!.

 

% Reverse the list.

reverse(X,X1) :- reverse(X,[],X1).

reverse([],A,A).

reverse([H|T],A,Y):- reverse(T,[H|A],Y).

 

%   \+ X1 == end_of_file,

 

File ncount.pl

% Extra information about the XHTML document node count

% Coded by Rizvi Hasan

% Date 20141013

 

% nodecount(+,-) is for counting the total number of nodes.

 

ncount(Y):-nodecount(Y,0,N),write([‘Node Count’, N]).

nodecount([],Acc,Acc).

nodecount([_|T],Acc,N):-  Acc1 is Acc + 1, nodecount(T,Acc1,N).

 

File dcg_rules.pl

% Grammer rules for XHTML validation.

% Coded by Rizvi Hasan

% Date 20141013

 

% example execution: doc([html,body,body_close,html_close],[])

 

doc –> html_start, htmlCont, html_stop.

 

htmlCont –> head, body.

head –>head_start, headElm, title, headElm, head_stop.

title –> title_start, title_stop.

 

headElm –> [].

headElm –> meta_start, meta_stop, headElm.

headElm –> link_start, link_stop, headElm.

 

body –> body_start, seris0, body_stop.

 

% Block element and inline element validation.

% Seris of mixed block and inline element

seris –> [].

seris –> blockElm, seris.

seris –> inlineElm, seris.

 

% seris of block elements

seris0 –> [].

seris0 –> blockElm, seris0.

 

% Seris of inline elements

seris1 –> [].

seris1 –> inlineElm, seris1.

 

% Seris of a elements can contain specifically img, strong,

% label  etc.

serisA –> [].

serisA –> inlineElm(A1),{  A1 == strong;

A1 == label;

A1 == em;

A1 == i;

A1 == b;

A1 == span;

A1 == caption;

A1 == img}, serisA.

 

 

% A block element must have a start element andendelement

% and can contin

% N* block or inline elements.<table>isaspecialblock

% level element.

blockElm –> blockElm_Start(other,X), seris, blockElm_End(other,X).

blockElm –> blockElm_Start(table), caption, tableHeader, serisRow, blockElm_End(table).

 

% An inline element must have a start element and end element and

% can contin N* inline.

 

inlineElm –> inlineElm_Start(other,Y), seris1, inlineElm_End(other,Y).

% Some inline element should not contain any other inline elements.

inlineElm –> inlineElm_Start(self,Z), inlineElm_End(self,Z).

% <select> is a special type of inline elemrnt which should contain

% only options.

inlineElm –> inlineElm_Start(select), serisOption, inlineElm_End(select).

% <a> is a special type of inline elemrnt which should contain some

% specific inlines.

inlineElm –> inlineElm_Start(a), serisA, inlineElm_End(a).

inlineElm(Y) –> inlineElm_Start(_,Y), serisA, inlineElm_End(_,Y).

% Block element and inline element validation End

 

% <Table> validation

caption –> caption_start, caption_stop.

caption –> [].

 

tableHeader –> [].

tableHeader –> tr_start, serisHeader ,tr_stop.

serisHeader –> [].

serisHeader –> th , serisHeader.

th –> th_start, seris ,th_stop.

 

serisRow –>[].

serisRow –> row, serisRow.

row –> tr_start, serisCol, tr_stop.

serisCol –> [].

serisCol –> col, serisCol.

col –> td_start, seris ,td_stop.

% Table validation End

 

% <select> validation

serisOption –> [].

serisOption –> option_start, option_stop, serisOption.

% select validation End

 

% grouping of all the elements according to thair prpperties.

blockElm_Start(table) –> table_start.

blockElm_Start(other,X) –> pre_start,{X=pre};

p_start,{X=p};

div_start,{X=div};

hr_start,{X=hr};

h1_start,{X=h1};

h2_start,{X=h2};

h3_start,{X=h3};

h4_start,{X=h4};

h5_start,{X=h5};

h6_start,{X=h6}.

 

blockElm_End(table) –> table_stop.

blockElm_End(other,X) –>   pre_stop,{X=pre};

p_stop,{X=p};

div_stop,{X=div};

hr_stop,{X=hr};

h1_stop,{X=h1};

h2_stop,{X=h2};

h3_stop,{X=h3};

h4_stop,{X=h4};

h5_stop,{X=h5};

h6_stop,{X=h6}.

 

inlineElm_Start(other,Y) –>    label_start,    {Y=label};

code_start,     {Y=code};

caption_start,  {Y=caption};

span_start,     {Y=span};

strong_start,   {Y=strong};

em_start,       {Y=em};

i_start,        {Y=i};

b_start,        {Y=b}.

inlineElm_Start(self,Z) –> img_start       ,{Z=img};

br_start        ,{Z=br};

input_start     ,{Z=input};

textarea_start  ,{Z=textarea}.

inlineElm_Start(select) –> select_start.

inlineElm_Start(a) –>  a_start.

 

inlineElm_End(other,Y) –>      label_stop,     {Y=label};

code_stop,      {Y=code};

caption_stop,   {Y=caption};

span_stop,      {Y=span};

strong_stop,    {Y=strong};

em_stop,        {Y=em};

i_stop,         {Y=i};

b_stop,         {Y=b}.

inlineElm_End(self,Z) –>   img_stop        ,{Z=img};

br_stop         ,{Z=br};

input_stop      ,{Z=input};

textarea_stop   ,{Z=textarea}.

inlineElm_End(select) –>   select_stop.

inlineElm_End(a) –>    a_stop.

 

 

 

 

 

 

 

File dcg_lexicon.pl

% Lexicons

% Coded by Rizvi Hasan

% Date 20141013

 

 

% Head elements

html_start –>  [html_start].

html_stop –>   [html_stop].

head_start –>  [head_start].

head_stop –>   [head_stop].

meta_start –>  [meta_start].

meta_stop –>   [meta_stop].

title_start –> [title_start].

title_stop –>  [title_stop].

link_start –>  [link_start].

link_stop –>   [link_stop].

 

% Body elements

body_start –>  [body_start].

body_stop –>   [body_stop].

 

% Block elements

h1_start –>    [h1_start].

h1_stop  –>    [h1_stop].

h2_start –>    [h2_start].

h2_stop  –>    [h2_stop].

h3_start –>    [h3_start].

h3_stop  –>    [h3_stop].

h4_start –>    [h4_start].

h4_stop  –>    [h4_stop].

h5_start –>    [h5_start].

h5_stop  –>    [h5_stop].

h6_start –>    [h6_start].

h6_stop  –>    [h7_stop].

div_start –>   [div_start].

div_stop –>    [div_stop].

p_start –>     [p_start].

p_stop –>      [p_stop].

hr_start –>    [hr_start].

hr_stop –>     [hr_stop].

table_start –> [table_start].

table_stop –>  [table_stop].

th_start –>    [th_start].

th_stop –>     [th_stop].

tr_start –>    [tr_start].

tr_stop –>     [tr_stop].

td_start –>    [td_start].

td_stop –>     [td_stop].

pre_start –> [pre_start].

pre_stop –> [pre_stop].

 

% Inline elements

caption_start–>[caption_start].

caption_stop–> [caption_stop].

strong_start –> [strong_start].

strong_stop –> [strong_stop].

em_start –> [em_start].

em_stop –> [em_stop].

i_start –> [i_start].

i_stop –> [i_stop].

b_start –> [b_start].

b_stop –> [b_stop].

span_start –>  [span_start].

span_stop –>   [span_stop].

code_start –>  [code_start].

code_stop –>   [code_stop].

select_start –> [select_start].

select_stop –> [select_stop].

 

% SelfInline elements

img_start –>   [img_start].

img_stop –>    [img_stop].

br_start –>    [br_start].

br_stop –>     [br_stop].

input_start –> [input_start].

input_stop –> [input_stop].

textarea_start –> [textarea_start].

textarea_stop –> [textarea_stop].

option_start –> [option_start].

option_stop –> [option_stop].

a_start –>     [a_start].

a_stop –>      [a_stop].

label_start –> [label_start].

label_stop –> [label_stop].

 

 

 

 

 

 

 

 

 

 

C#.NET program code

// Coded by Rizvi Hasan

// Date 2014-09-28

 

using HtmlAgilityPack;

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using System.Windows;

using System.Windows.Controls;

using System.Windows.Data;

using System.Windows.Documents;

using System.Windows.Input;

using System.Windows.Media;

using System.Windows.Media.Imaging;

using System.Windows.Navigation;

using System.Windows.Shapes;

using System.Xml;

using System.IO;

 

namespace XHTMLReader

{

/// <summary>

/// Interaction logic for MainWindow.xaml

/// </summary>

public partial class MainWindow : Window

{

private string _strExcelFilename = “”;

 

public MainWindow()

{

InitializeComponent();

_strExcelFilename =  @”%userprofile%\documents”;

// Commant at production

//_strExcelFilename = @”C:\Users\rizvis\Documents\Mina Mapp\Dropbox\ID2213\ProjectProlog\Input Files\sample1.htm”;

lblInputFile.Text = _strExcelFilename;

}

 

private void btnFile_Click(object sender, RoutedEventArgs e)

{

//strExcelFilename = System.IO.Path.GetDirectoryName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName)

//_strExcelFilename = “%userprofile%\\documents”;

 

 

// Do not Import namespace System.Windows.Forms. Iit will confuse with other identifiers in WPF.

var  f = new System.Windows.Forms.OpenFileDialog();

f.Filter = “Excel files (*.html, *.htm,*.xml,) |*.html;*.htm;*.xml”;

f.InitialDirectory = _strExcelFilename;

if (f.ShowDialog() == System.Windows.Forms.DialogResult.OK)

{

if (f.FileName != null && f.CheckFileExists == true) {

_strExcelFilename = f.FileName;

lblInputFile.Text = _strExcelFilename;

txtFileName.Text = f.SafeFileName;

}

}

 

}

 

private void btnRead_Click(object sender, RoutedEventArgs e)

{

xmlTextBlock.Text = string.Empty;

// Good source for XHTML defination: http://www.w3schools.com/html/html_xhtml.asp

 

HtmlDocument doc = new HtmlDocument();

doc.Load(_strExcelFilename);

 

//var myNodes = doc.DocumentNode.SelectNodes(“//a[starts-with(@id,’menu-item-‘)]”);

List<HtmlNode> myNodes = doc.DocumentNode.Elements(“html”).ToList();

ExploreNodes(myNodes);

 

 

//xmlTextBlock.Text = xmlTextBlock.Text + ” STOP!!!” ;

 

write_to_File(xmlTextBlock.Text);

 

 

}

 

 

private void OutputLog(HtmlNode node, string indicator)

{

if (node.Name == “#text”) return;

xmlTextBlock.Text = xmlTextBlock.Text + node.Name + “_” + indicator + ” # “;

xmlList.Items.Add(node.Name + “_” + indicator);

 

}

 

private void ExploreNodes(List<HtmlNode> nodes)

{

foreach (var item in nodes)

{

OutputLog(item,“start”);

if (item.ChildNodes.Count > 1)

{

ExploreNodes(item.ChildNodes.ToList());

OutputLog(item, “stop”);

}

else

{

OutputLog(item, “stop”);

}

 

}

}

 

private void write_to_File(string p)

{

string mydocpath = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);

StringBuilder sb = new StringBuilder();

 

// Split the array

var _nodes = p.Split(new string[] { ” # ” }, StringSplitOptions.RemoveEmptyEntries).ToList();

 

// Build the string for the file

foreach (string node in _nodes)

{

 

sb.AppendLine(node.ToString().Trim() + “.”);

//sb.AppendLine(node.ToString().Trim() + ” –> [” + node.ToString().Trim() + “].”);

 

}

 

// Write the string builder string to a file.

using (StreamWriter outfile = new StreamWriter(mydocpath + @”\Prolog\xhtml_nodes.txt”))

{

outfile.Write(sb.ToString());

}

lblOutFile.Text = mydocpath + @”\Prolog\xhtml_nodes.txt”;

lblNodeCount.Text = “Nodes : ” + _nodes.Count.ToString();

 

}

 

 

 

}

}

 

Have your foot in the door with Delegate and Events

delegate

Delegate and events are one of the most used techniques in a program. To my opinion people use them most but writes them the least. Recently one of my junior asked me about an explanation about delegate and event and this is the reason why I am here sitting and writing for the next generation programmers.

Frankly speaking delegates are a kind of an advanced version of function pointer. Those who are familiar with C/C++ knows that function pointers are special type of pointers that store the address of a function in the stack. This address can be passed freely throughout the program and later on when needed can be executed anywhere of the program.

One the other side events are a kind of signal in C/C++. These signals can be raised by your program as like a signal can be raised by the Operating System when you plug-in a USB stick; but someone out there needs to listen the single and handle them properly.

Well; all those are theoretical and now we can go practical use of it. Delegates and events are used in scenarios where we need publish-subscribe pattern. Now that is publish-subscribe pattern? Hit on wiki here for a detailed description. But in a nutshell The Publishers will raise a signal and the subscribers will listen to those signals and act accordingly. What I want to emphasis on public subscriber pattern is The Publisher should not have any knowledge of who are The Subscribers are, and The Subscribers should not have any interaction with The Publishers, except one thing – listening to The Publisher’s event.

In Publisher.cs

So now we have a publisher class where we need a delegate and delegate-type-variable. As I discussed above that the delegate will point to a function so it must specify a function signature.

public delegate bool NewEditionPublishHandler(object Publisher, string EditionName , int EditionNr);

This function signature is the type of the delegate-variable.

public NewEditionPublishHandler Publish;

Later in Publisher this delegate-variable will point a function in the Subscriber class so that the publisher can call that function using this delegate-variable.


public delegate bool NewEditionPublishHandler(object Publisher, string EditionName , int EditionNr);

public NewEditionPublishHandler Publish;

… … …

Publish(this, "Harry Potter" , i++);

Notice that Publisher is calling a function of a Subscriber without any knowledge of it. No reference, no variable, no knowledge at all, except the face that the subscriber must listen to the signal send by the publisher i.e. Subscriber must subscribe a function to that signal and of course that function signature should be the same as that delegate-type.

At the subscriber class this is done by the following code.

In Subscriber.cs


public void SubscriberToPublisher(Publisher publisher)
{
   publisher.Publish += new Publisher.NewEditionPublishHandler(ShowPublicationDetail);

   //or (event subscription with delegate keyword)

   publisher.Publish += delegate(object SubscribedPublisher, string publicationName, int publilcationNr)  { … return True };

   //or (event subscription with linq expression)

   publisher.Publish += (SubscribedPublisher, publicationName, publilcationNr) => { … return True;};
}

The above three statements are doing the same task with three different syntax.

Here two interesting things are happening.

  • Subscriber is subscribing itself to a signal of the publisher – Subscriber is binding an anonymous function with the signal of the publisher so that whenever a signal is raised from the publisher, this function will be executed.
  • Publisher is delegating its task to the Subscribers – Publisher is delegating its task to the subscribers function so whenever the publisher is calling the delegate-variable, it’s the subscribers anonymous function gets executed.

 

In Program.cs

In our program somewhere we need to create a publisher and a subscriber and subscribe to the publisher.


Publisher P = new Publisher();
Subscriber S = new Subscriber();
S.SubscriberToPublisher(P);
Subscriber2 S2 = new Subscriber2();

S2.SubscriberToPublisher(P);
 
Finally run the publishers PublishRegularly().
 
P.PublishRegularly();

This PublishRegularly() function will in a regular interval call the delegate-variable with appropriate perimeters. This delegate-variable which will in turn delegate its task along with its parameter to the subscriber’s anonymous function and get the job done by the subscriber.


public void PublishRegularly()
{
   while (true)
   {
      Thread.Sleep(2000);
      if (Publish != null)
      {
         Publish(this, "Harry Potter (pert " + i.ToString() + ") ", i++);
      }
   }
}

Notice that we are doing a check ((Publish != null)). This is done because if there is no subscriber  subscribing the publisher, the publisher can still run the function i.e. the publisher is completely ignoring whether any subscriber is subscribing it or not.

Even if there is more than one subscriber subscribing, it does not matter to the publisher either.  It’s the subscribers responsibility to listen to the publishers signal and act accordingly with the help if its own anonymous function.

Event

Now in the Publisher.cs file change the decleration of the delegatetype-variable  like this


   public NewEditionPublishHandler Publish;
    public event NewEditionPublishHandler Publish;

And in the Subscriber.cs change the publishers signal subskribtion(event subskribtion) like this


   publisher.Publish += (SubscribedPublisher, publicationName, publilcationNr) =>{…}
   publisher.Publish = (SubscribedPublisher, publicationName, publilcationNr) =>{…}

With this you will get an error.


Error 1 The event 'Delegate_Event_Test.Publisher.Publish' can only appear on the left hand side of += or -= (except when used from within the type 'Delegate_Event_Test.Publisher')                         C:\Users\rizvis\Documents\Visual Studio 2010\Projects Test\Delegate_Event_Test\Delegate_Event_Test\Subscriber.cs                 23                     23                         Delegate_Event_Test.

Thanks to ‘event’ to make this error because with the statement


   publisher.Publish = (SubscribedPublisher, publicationName, publilcationNr) =>{…}

Here you are not subscribing a signal from a delegate-variable rather you assigning a wrong value to it. This is wrong and your program will not work as it is intended. Without the ‘event’ keyword you will not get a compilation error and thus you are planting a bug in your code. So with the ‘event’ key word even if you missed += with =, it will show up at compile time.

Bird’s eye view

So with a delegate we are preparing an object (i.e. Publisher) to emit a signal to another set of objects (i.e. Subscribers) who are subscribed with publishers signal with an anonymous function. As a notification of the signal the subscribers will execute their own function. So simple J

Download source code
Publisher.cs
Subscriber.cs
Program.cs

Hack around with PECL libs for php in Lubuntu

peclsmall

From my very childhood I dreamed about many things to be when I am grownup – from a soldier to sailor, from a pilot to postman, from engineer to innovator but, by any chance never dreamed about a writer. Now I am writing.

Like the same nature of your life you may need certain things you never have thought about. As I needed to install the PECL library for PHP in an Ubuntu machine. Well how to do it, is pretty much straight forward steps and not much interesting – what is interesting is my experience with its installation.

Well I needed a function in PHP called ‘id3_get_tag’ which needs a package id3. This package can be found in a repository called PECL. This id3 package is maintained by Stephan Schmidt and Carsten Lucke. Thanks to them on behalf of me. I and many others like me are always thankful to those guys who are maintaining this kind of open source libraries.

However you can download the id3 extension of PECL package from here. Download it to your Ubuntu machine – for me it was a lubuntu machine. Unzip it. Follow this commands


cd ./Downloads/Temp/ id3-0.2
phpize
./configure
make

Ignore the errors of the phpize command. The phpize command is used to prepare the build environment for a PHP extension. In our case id3 extension for the PECL package. After the make you will find a shared library called id3.so possibly in the modules directory.

Now all you will have to do is place this library is a place where php5 can reach it. Now to find out this place you will have to hack a little bit. Find this kind of other library in the php.ini or mysql.ini or gd.ini. For example

In the gd.ini you will find

extension=gd.so

In the mysql.ini you’ll find

extension=mysql.so

Next we will have to find the location of these files with command


find / -type f –name ‘gd.so’
find / -type f –name ‘mysql.so’

If you compare the paths you will be able to see a common location for these two files (../php5/…/gd.so). Bang!! That’s the location from where the php5 is loading them. For our case the location was /usr/lib/php5/20090626+lfs.

Now all we have to do is copy our shared library id3.so to that location. Then create a id3.ini file in the location /etc/php5/conf.d. Write ‘extension=id3.so’ in the file.

cd  /etc/php5/conf.d

sudo echo “extension=id3.so” > id3.ini

Restart the apache.

sudo /etc/init.d/apache2 restart

Now test a page with function ‘id3_get_tag’, it will execute successfully. Walla!! You have successfully installed the id3 library from PECL package.

You can absolutely do the same for the other libraries in the PECL package.

Have faith in your detective mind while dealing with php.ini and apache2

Look for the problem

 

More or less we all know that solving a problem in the programming world needs a lot detective work. And to my experience (mostly from hollywood movies) on detective operations there is some rule of thumbs.

  1. Follow the trail up to a reasonable ending
  2. Try to link the points

However last night I was caught up in a tedious problem with my apache server with php5. I needed to upload large files to the server – more than 10 MB. By default apache2 with php5 won’t allow you to upload more than 2MB files in the server.

Now to make it do the task you will have to do is modifying the php.ini file which normally resides in the path /etc/php5/apache2/. In this file there is a flag setting upload_max_filesize = 2M. I needed to set it to upload_max_filesize = 20M. So I did it and tried to upload a file using a php-script. A typical script of this kind can be found here. To find the upload failing reason check the $_FILES[‘userfilefield’][‘error’] value. If the value is 1 then it is the file size causing the problem. More of this error values are described here.

But to my surprise I found the value of $_FILES[‘userfilefield’][‘error’] to 1 which means it’s the file size causing the problem. I scratched my head a little while then started to think what could be the other problems. To find this out I started to trigger my detective mind and rollout a list of reasons that could cause this fail.

Detective question 1

Is the apache loading this php.ini file properly?

To find this I need to run a script say testup.php containing . These I will be able to see a list of flags loaded by apache2 and their values. To find out whether the php.ini is loaded or not check this.

Loaded Configuration File /etc/php5/apache2/php.ini

So it’s loading or at least started loading the file.

So the next question is –

Detective question 2

Has the upload_max_filesize flag value loaded in apache?

Check the that flag in the testup.php

upload_max_filesize 2M 2M

Bang!! It’s not loading the flag value properly. And that is why it’s failing to upload my large file.

Well why php.ini is not loading where I have changed the value myself and saved the file properly. So the big question is

Detective question 3

Why the upload_max_filesize flag is not loaded whereas the php.ini file has started loading?

Here at this point I was derailed from my detective rules and started to become impatient. I started to find out a patch on how to fix it with lots of googleing rather than finding the answer of the above question. With this mistake I have started lots of suffering which I could easily avoid if I have maintained the detective rules.

Likely solution 1:

One solution seemed most likely to solve my problem – that is to use the .htaccess file in the directory there the testup.php resides. In this .htaccess file you just will have to put this

php_value upload_max_filesize 20M
php_value post_max_size 22M

Then while accessing the testup.php file apache2 will automatically change the values of those flags. But to my ignorance, this will only work while apache2 have successfully loaded the php.ini file and in the file it has been explicitly told to do so. So very reasonably it didn’t help me either.

Likely solution 2:

The other solution seemed suitable is to use the “php -i | grep php.ini” and see there the php5 is loading the php.ini from which is “/etc/php5/cli/php.ini”. But here also to my ignorance this php.ini is nothing to do with apachi2. So very reasonably changing the flag value in this file had no effect on apache.

At the verge of my patience and back to the detective question

Why the upload_max_filesize flag is not loaded whereas the php.ini file has started loading?

Finally I came back to my detective question and started following my detective mind. Now I will have to check whether the apache2 has successfully finished loading the php.ini file. To check this I will have to check the log of the apache2 which is “/var/log/apache2/error.log” for my Ubuntu machine.

Solution comes automatically

To my surprise I found lots of error loading the php.ini as bellow.

PHP: syntax error, unexpected ‘&’ in /etc/php5/apache2/php.ini on line 110
PHP: syntax error, unexpected ‘&’ in /etc/php5/apache2/php.ini on line 110

Then I understood even apache2 had started loading the php.ini file, these errors did not let the php.ini finished loading successfully.

Now the life became easy, fix those errors in the php.ini file and restart the apache2. Then my large files were uploaded successfully with my testup.php script. What I had the problem with php.ini file is the value “Default Value: E_ALL & ~E_NOTICE”. Some how apache could not parse the “&”. Then I used “Default Value: E_ALL” instead of the previous value. But this new flag setting will introduce a new problem that it will not show you the compilation-errors on the page. Set the display_errors flag to On i.e. ”display_errors = On”. Finally solved the problem as I wanted. Sweet, isn’t it?

Bottom line: stick to your detective mind no matter how complex the question becomes.