ZERO: stories from the trenches

Showing posts with label stories from the trenches. Show all posts

Friday, June 15, 2012

SBAs - Search Based Applications

Googling for the above term (as of this day) does not returns quality results. However, if you search for Search Based Applications (SBAs) you will hit a wikipedia entry and certain other articles which have used the similar term. You will even find a book by the same name on Amazon. To sum, it simply means that my thought process is cutting edge ;) ...come on, this is my blog, I can brag!

But, I'm not kidding, the idea in itself is very simple... just use right data-structure(inverted indexes) for read-intensive jobs and it dramatically simplifies your application architecture and brings God sent agility to your deliveries. Let me try to explain that...

In traditional database-centered-applications data stored in relational databases can only be retrieved through SQLs, which is a good thing as it standardizes the information access. However, let us try to contrast them with SBAs;

If the desired information is too complicated to derive from the existing data, think query complexity, frequency for the requested information, scalability, while SBAs can process both structured and natural language queries and then provide faceted navigation which allows users to slice-n-dice data in innumerable ways.
If the needed query does not exists and as such users will need to wait for the completion of next sprint while in the other case if you have indexed the data, the information is mostly available. Think fast time to market, high agility!
Forget mulitple tables, think about a scenario where in you need collate information from multiple data sources read emails, blog entry, cvs, pdf, office documents, simply put if you can groke text SBAs can digest them to seek intelligent information later.

As if it was not enough motivation for us to experiment more with SBAs here is some more;

SBAs are more fault tolerant to user queries and can make spelling suggestions, return results for synonymous queries
You can integrate your SBA with some statistical based system to identify popular entries or make recommendations using clustering and collaborative filtering
You can even filter results which are not suitable for user think access permissions, adult content, exclusively for club members

Mind you all this talk is not just about 'search text box' generated results the usage can be more implicit based on user's context which may be driven by application specific use cases like content browsing.

Today I've just scratched the surface, more later..

Do let me know your queries and feedback.

Sunday, April 1, 2012

Wannabe Architect 101

I feel fortunate to be a professional Software Engineer as no other career option would have given me the luxury to contribute in myriad different ways and power to change them for good. But, with power comes responsibility and to conduct one self in graceful manners in adverse situations. Today, I'm going to share some of my first hand experiences working in technical leadership role;

you do not have authority - get this straight into your head, as it sets the foundation for our discussion.
you have to sell/evangelize your ideas - as you are not a manager you cannot (should not) try to force your ideas/opinions. However, you can evangelize the choice framework/ technology/ language options by providing a working proof of concept (be ready to trash it, if it doesn't works though!). Make note to address people's concerns like difficulty to comprehend the "new" thing or may be to help them appreciate the concerns you have about the existing thing. Remember, any change forces people to come out of their comfort areas. The change you are trying to accommodate must not be intimidating for PEOPLE
assert without being blunt - although i have no right to preach this but people who have worked with me be known that I'm very working aggressively on ways to put things in improved context. Although, I still favor shared objectivity as it helps me focus on task at hand.
invest in your relationship - "If someone is strongly disliked, it’s almost irrelevant whether or not she is competent; people won’t want to work with her anyway. By contrast, if someone is liked, his colleagues will seek out every little bit of competence he has to offer."
people have short impression and long term memory - use it in your favor - it is never too late to start afresh lead by example, people will copy "as it works" and chances are some of them will eventually appreciate the concepts. You want your team to write unit tests, show them how you have written for some of the components in the current project and then tell them how it helped being productive, rather than the other way around. If you go gaga about DIs and AOPs and FPs or any other alphabet soup and tell the benefits chances are people won't practice it however if you just do it and others in the team start identifying the improved productivity the smarter ones will copy. Be ready to help them without being preachy or The Guru.. you have created your community of users!!

Top it all be stateless! like one of those web-services which you might design yourself. Do not carry the emotional backlog you created with working with people you cannot stand or plain incompetent. Start afresh, it helps to travel lite.

Saturday, September 3, 2011

Cosmic Link

Now that free lunch is over the onus now lies on us (read Software Engineers) to gainfully utilize the computing resources available to us. It is now therefore imperative for us to develop and hone our fundamental skill sets and may be stop(OK, pause) running after the next 'big' thing (read technology/framework).

For those of you who have not yet hit the 'dead-end' be warned. It would help you to develop your skills on;

Problem Solving because no matter what 'flashy' thing you know there will always be some thing new hitting you every now and then. Learning to solve problems not only makes you more employable but also makes you more confident and inspirational for others (think thought leadership)
The choice of Data Structures and Algorithm you employ in your applications may be all that you need to optimize their non-functional needs and nothing is more satisfying than making your application perform better.
While argumentatively speaking I do agree that frameworks are making us dumb but that is largely because we are always in a hurry to just complete the task without really understanding the framework internals. In not so distant past, I found it difficult to solve a bug which creeped into my code because of my incorrect understanding of java.lang.String#split(String regex) method. The problem was later identified and was mixed in matter of few seconds by merely reading (after everything else failed, and I decided to raise a bug!! ) the associated javadocs and saved myself from the embarrassment.
Writing concurrent code is both error prone and difficult and till recently I didn't really face a performance issue, may be the free lunch was still available to me... but no more. Work on it.

The above piece is not only relevant to java developers for us all. Think about it knowing your DB internals(say join algorithm) will help you optimize your queries or may be if you write shell scripts knowing the correct order of filtering might put your script on steroids...

Please do not get offended by reading the above, the tone might sound prescriptive.. but is really a result of my reflections on my recent mis-adventures and is put to public notice lest you repeat them yourself.

Sunday, January 9, 2011

Innovation - Generating Ideas: The monkey's view

Not again ..not another hyperbolic session on INNOVATION but these days almost every one in my social circle at least hints about innovation. Not so surprisingly it is a constant endeavour of our species to be introspective and improve continuously, so much so that historians have even classified our cultural heritage in terms of the innovations that have distinguished them viz: the stone age, iron age.. Similarly, the role of innovation at enterprises is pretty much tied with their sustainability and even existence.

The dictionary defines the term in a rather half baked manner (blasphemous?!!) while it does appreciates creating the "new thing" it safely discounts the efforts gone into continuous improvement of existing "thing". Sadly the incremental or the continuous improvements made with the existing "things" meets the same fate and goes unnoticed at our enterprises.

Also, we tend to focus more on the fruits of innovation, that is, the "product" itself and do tend to take the process part for granted and much worse "a necessary evil" at least in my software industry. It is only timely that much work of art is already published (think continuous delivery) but they are yet to make to the mainstream howsoever "agile" the enterprises may like to call themselves.

I understand by now must have already read my mind and must be wondering if I'm really going to add any value... some of the ways I think can make a good start can be summarized as under;

Become a producer of content.. in my limited scope of understanding I feel most of us do not write or express ourselves or take stand for the fear of being judged. Let's change that and start putting our thoughts in black and white here on blogs, comment on others blogs.. connect. Not many of our experienced leaders write. Writing is such a powerful medium that one can reach out to whole bunch of people at one time which is boon for our time-pressed industry leaders.
In one of my earlier post I discussed the challenges I faced to share knowledge. Some of the ways sharing knowledge can help is to come out of our stove pipe modes and satiate our higher needs of self actualization. There is this nice article which covers the subject in greater depth and also provides the relevant background knowledge. You will do good to read it for yourself.
Management determines the organizational climate. It can send a clear message through rewards, mentor employees to participate in open source software products, participation at technical conferences, taking engineers to customer meetings, sales team to participate in test cycles...
The above mentioned set of activities create and encourage the cross-pollination of ideas which can create magical results.. you never know when that bulb glows in someone's head, the EUREKA moment!!
Support innovators as per Seth Godin if you don't support and nurture them it simply puts spanner into the innovation engine even before it gained any considerable momentum.

Innovation is critical to our growth and generating ideas is critical to innovation itself while thinkers must understand their responsibilities have just begun. Here I made an attempt to showcase some of the problems which should be addressed to create value added services to serve our society.

Saturday, November 27, 2010

Product Versioning: Embedding packaging information

Almost serendipitously, I discovered the Java Package class today. One must wonder how possibly this is going to make a difference (aka increase their geek quotient or coolness factor). Well the beauty lies in the details.

Problem Statement: You commit your code to the cvs and after going through the complete lifecycle experience your code finally sees the light of the day, much to your chagrin that the users discovered some bug, even after all those unit testing and that pragmatic ranting ;) But, then you are the rock star developer who had already discovered the problem and fixed it :D but how do you know if user is not using some older version of your package???

Solution: You can embed the information in the manifest file at build time which could be read by exploding the jar, simple!! NO, most of your users wouldn't (shouldn't) know it.It would be really nice if you can print this star-studded information at the beginning of the code execution. You can make this as the first line of your log file or may be a separate file which could be used for bug reporting, options are open.. How do you do it?

Step 1: Create an annotation.

package org.zero;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target(ElementType.PACKAGE)
@Retention(RetentionPolicy.RUNTIME)
public @interface MyVersionAnnotation {

    String version();

    String revision();

    String date();

    String user();

    String url();
}

Step 2: Generate a class package-info.java

@MyVersionAnnotation(date = "2010-11-26", revision = "11", url = "http://onjava.com/pub/a/onjava/2004/04/21/declarative.html?page=3", user = "nitin", version = "123")
package org.zero;

Step 3: Create a class to access this information;

/**
* This class finds the package info for mypackage and the MyVersionAnnotation
* information.
*/
public class PackageDemo {
    private static Package myPackage;
    private static MyVersionAnnotation version;

    static {
        myPackage = MyVersionAnnotation.class.getPackage();
        version = myPackage.getAnnotation(MyVersionAnnotation.class);
    }

    /**
    * Get the meta-data for the mypackage package.
    *
    * @return
    */
    static Package getPackage() {
        return myPackage;
    }

    /**
    * Get the mypackage version.
    *
    * @return the mypackage version string, eg. "0.6.3-dev"
    */
    public static String getVersion() {
        return version != null ? version.version() : "Unknown";
    }

    /**
    * Get the subversion revision number for the root directory
    *
    * @return the revision number, eg. "451451"
    */
    public static String getRevision() {
        return version != null ? version.revision() : "Unknown";
    }

    /**
    * The date that mypackage was compiled.
    *
    * @return the compilation date in unix date format
    */
    public static String getDate() {
        return version != null ? version.date() : "Unknown";
    }

    /**
    * The user that compiled mypackage.
    *
    * @return the username of the user
    */
    public static String getUser() {
        return version != null ? version.user() : "Unknown";
    }

    /**
    * Get the subversion URL for the root mypackage directory.
    */
    public static String getUrl() {
        return version != null ? version.url() : "Unknown";
    }

    /**
    * Returns the buildVersion which includes version, revision, user and date.
    */
    public static String getBuildVersion() {
        return PackageDemo.getVersion() + " from " + PackageDemo.getRevision()
                + " by " + PackageDemo.getUser() + " on "
                + PackageDemo.getDate();
    }

    public static void main(String[] args) {
        System.out.println("mypackage " + getVersion());
        System.out.println("Subversion " + getUrl() + " -r " + getRevision());
        System.out.println("Compiled by " + getUser() + " on " + getDate());
    }
}

PS: Source code courtesy org.apache.hadoop.util.VersionInfo

Tuesday, May 25, 2010

Hadoop application packaging

Archiving large number of small files into small number of large files

A small file is one which is significantly smaller than the HDFS block size (default 64MB).

We have a lot of data feeds in the range of 2MB per day, storing each as a separate file is non-optimal.

The problem is that HDFS can't handle lots of files, because, every file, directory and block in HDFS is represented as an object in the namenode's memory, each of which occupies 150 bytes. So for 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible.

Furthermore, HDFS is not geared up to efficiently accessing small files: it is primarily designed for streaming access of large files. Reading through small files normally causes lots of seeks and lots of hopping from datanode to datanode to retrieve each small file, all of which is an inefficient data access pattern.

Also, HDFS does not supports appends (follow http://www.cloudera.com/blog/2009/07/file-appends-in-hdfs/).

Known options are;

Load data to Hbase table and periodically export them to files for long term storage. Some thing like we have product log for a particular date/timestamp against the content of the files stored as plain text in Hbase table.
Alternatively, we can treat these files as pieces of the larger logical file and incrementally consolidate additions to a newer file. That is, file x was archived on day zero, the next day new records are available to be archived. We will rename the existing file to let's say x.bkp and then execute a mapreduce job to read the content from the exiting file and the new file to the file x.
Apache Chukwa solves the similar problem of distributed data collection and archival for log processing. We can also take inspiration from their and provide our custom solution to suit our requirements, if needed.

Thursday, March 11, 2010

Push Button Automation for the Humans

Have you ever been responsible for the installation of a software which should be distributed and installed on myriad set of execution environments with equally diverse sets of configuration in a department where machines are constantly being cleaned and re-imaged? Picture a situation where you are supporting various configurations and levels of your module. Rather than spending hours installing by hand, wouldn't it be great to have a way to automate the installation process so that you could just kick it off, go and get coffee, come back, and have it all installed and ready? Call it 'Push Button' automation :) (Please bear with me for throwing newer phrases)

Strange as it might seem but this hand-made configuration is proving to be a nightmare of sorts.. one might feel exhausted with out really having any sense of accomplishment by solving petty issues which should not really have come up. High time one must set up;

standardized installation set up
provide application diagnostics which must enable operations team to solve small issues in time.

Time again Apache ANT comes to my rescue.

Thursday, March 4, 2010

How does the data flows when a job is submitted to Hadoop?

Based on the discussion here, typically the data flow is like this:

Client submits a job description to the JobTracker.
JobTracker figures out block locations for the input file(s) by talking to HDFS NameNode.
JobTracker creates a job description file in HDFS which will be read by the nodes to copy over the job's code etc.
JobTracker starts map tasks on the slaves (TaskTrackers) with the appropriate data blocks.
After running, maps create intermediate output files on those slaves. These are not in HDFS, they're in some temporary storage used by MapReduce.
JobTracker starts reduces on a series of slaves, which copy over the appropriate map outputs, apply the reduce function, and write the outputs to HDFS (one output file per reducer).
Some logs for the job may also be put into HDFS by the JobTracker.

However, there is a big caveat, which is that the map and reduce tasks run arbitrary code. It is not unusual to have a map that opens a second HDFS file to read some information (e.g. for doing a join of a small table against a big file). If you use Hadoop Streaming or Pipes to write a job in Python, Ruby, C, etc, then you are launching arbitrary processes which may also access external resources in this manner. Some people also read/write to DBs (e.g. MySQL) from their tasks.

Tuesday, February 16, 2010

Evolving programming paradigms

The free lunch is over and it simply means that;

Our applications may no longer be able to automatically seek benefits of hardware up-gradations, we will have to plan, design and make room for concurrency.
As a developer we need to hone our skills to make good use of hyper-threading, multi-core CPUs and caching.
Do not get confused with marketing slogans about new found technologies. Seldom, does a technology becomes mature too fast to become main-stream, generally it is old technologies which increasingly useful with time and better understanding of its developers that can gainfully be used to solve important problems.
Because, our applications do not automatically gain performance with newer hardware solving performance problems and performance tuning will become an important activity.
Start preparing now.

Saturday, January 30, 2010

Bug fix: Eclipse springsource on UBuntu 9.10

...so u have upgraded to UBuntu 9.10 and you are really impressed with its new features and you cannot control your excitement to check out your development environment and launch eclipse .. but oops! you are not able to create a new project.. none of the buttons seem to work chances are you are also suffering from the same bug follow the link for the fix

This issue is resolved in eclipse 3.5.2 M2 release. Hope that helps.

Friday, July 24, 2009

Finally spotted the 'Tiger'

Once upon a time long, long, long ago SCJP 5 was newly released and it was a proud moment in my life to secure a distinction in the same. I felt like I 'Tamed the Tiger' never ask how did I feel working on versions 1.4 and 1.3! all these years. Never mind, the 'Tiger' will soon be a sunset technology from Sun's stable... never mind 'Mustang' is galloping faster and faster... Today I spotted the 'Tiger'. I'm happy with the opportunities at hand to experiment... looking forward to play with the beast...

Sunday, June 21, 2009

Complete idiot's guide to Debugging

Time and again I have learnt much to my displeasure that I'm not efficient while debugging. Most often than otherwise, I would just miss the point and keep 'searching' for the issues. Sitting back I found one fundamental flaw in my approach to debugging a problem that there was no structure in my method! Instead of narrowing the cause-effect scenarios I was simply 'searching for the needle in the hay stack'. This would obviously take more time than expected and would lead to sheer wastage of my resources.

Learning about the scientific method of experimentation I found that solution of problems too complex for common sense to solve are achieved by mixture of inductive and deductive techniques.

The inductive logic demands us to start observing the 'system' and arrive at general conclusions. Make a mental note of your observations and try finding explanations to them chances are you would hit few observations which are not clearly understood in the given context, some thing which is kind of a 'mystery' and you find it tough to explain it the first time, hold on and do not get distracted collect information. Having done that now try to break your thoughts and write them under following heads;

Problem statement: The biggest mistake to avoid here is to write too much, things which you are not sure of.. just an assumption. Rather, it is much better to state the problem straight. It might sound dumb in he beginning but would save from making dumb efforts based on incorrect assumptions later ..
Hypotheses to he problem; typically list down possible reasons for the failure
Experiments to test your hypotheses: The most important mistake to avoid here is that the experiments must be designed to test the hypotheses nothing more nothing less. This step would help you narrow down your solution/problem area.
Expected outcomes will help you draw a base line for your experimentations.
Actual outcomes must be compared and contrasted with the expected outcome for rigour in the testing process. Any deviations must be carefully studied and isolated to ensure that the anomaly is not because of some mistakes/incorrect measurements this will save one from drawing incorrect conclusions.
Conclusion, here again care must be taken to state no more than the experiment proves/disproves.

By asking the right questions, conducting the right tests and drawing the right conclusions one might drill down the hierarchical structure of the problem to isolate the point of failure and fix it so that the system no longer fails at this point.

One might need to-fro between the two logical techniques to reach a plausible solution. Need to put this learning in practice now...

Thursday, April 23, 2009

Debugging

This is funny, this is not funny for me.

During this debugging session, I'm continuously guilt ridden for the technical failure is delaying the delivery and it was increasingly time consuming because I needed to question each of my assumptions, that sucks!

After my stress-profile-code-profile-stress cycles I did however learned a bit more about various JVM GC tuning options. Also, that in an overzealous attempt to gain customer delight the code was doing far more than required. I moved a step back and thinking from the usability perspective the problem immediately visible. Thanks to right encapsulation none of the interfaces are breaking and the affected code module could be refactored to fix the performance issue at hand. Further, tweaking the data structure to collect data for generating the performance reports helped us achieve performance gains. Now, we are able to do more with less. The trick is finding the application of technology that makes the most sense for solving your problem at hand.

Looks like my build is complete, let me profile my code again..

Tuesday, April 21, 2009

Java: Heap crunch

Profile-code-deploy-load was all that I did yesterday and yes it did improve the system performance and technically it did survived the brutal four hour load test without causing OOOE (out of memory exception). The primary hypothesis under which I'm working is that their the certain long living references have retained a sizable of memory. But, I'm sure I'm not exactly able to find the root cause for the system degradation under load. The important observations are;

We are loading the system using a pattern where we ramp up the load in x minutes and maintain the peak load for y hours and hen ramp down to zero in x minutes.
The heap usage percentage follows the similar pattern during ramp up, but although the load has plateaued the heap usage percentages continues go higher, while ramp down it eases a bit but never follows the decrease in load and never actually reaches zero or minimum even though there is no activity in the application.
CPU times in GC increases almost linearly albeit with a low gradient and decreases to zero after load completes.

I plan to study GC logs today to understand the system characteristics and fine tune the JVM fit for our purpose and I am sill not sure if I have a memory leak... more later.

Sunday, April 19, 2009

Java: Memory leak

Around two years back, when I encountered an article mentioning memory leak in Java, it was little surprising for me and may be author knew about his audience and went ahead to provide one demo program. It surely made me aware of the issue and certain anti-patterns to avoid. Then, I encountered my very own first Java memory leak, and that I my skills are not adequate to handle them. Here, in the current case, when I first observed this phenomenon I went ahead and explicitly 'nulled' my references, which are other wise supposed to go out of scope any ways and hence eligible for garbage collection, but to my frustration this has only marginally helped as the changes have only reduced the rate of memory leak and has not eliminated it. Now I plan to work to work associated anti-patterns like that weak references or references to small living objects held by long living objects etc. Will keep you posted.

Thursday, April 9, 2009

Failed to connect to remote VM. Connection refused. Connection refused: connect

A lot of time was wasted to resolve the above mentioned error, while I was trying to connect to a remote server using my eclipse IDE in debug mode. Here is the solution;

Compile your application using '-g' option. This will generate all the debugging related information, including local variables. By default, only line number and source file information is generated.
Launch a new Java Virtual Machine (VM) with the main class of the application to be debugged. For e.g. Launch a new Java Virtual Machine (VM) with the main class of the application to be debugged. For e.g. $> gt; java -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket, server=y, suspend=n, address=8000

Hope that helps.

Thursday, March 12, 2009

Why slow?

Hmm.. so good that you are able to monitor response times of the application components but then how does that helps tuning the application... do not count activity.. show me the productivity.. so here I'm able to monitor my code and generate some couple of reports.. but then they didn't really helped me understand performance bottlenecks.. I would run them again and again to aggregate more reports!! No amount of diligence seemed to work.. imagine I was not able to find problems.. that reminds me of the adage 'bin mange moti mile aur mange mile na bheekh' ... I had understood there is sure shot knowledge gap from technical and business perspective... after all we are still far from creating tools which would scan and identify and fix code on their own. There are just so many options for every refactoring which one intends o make in the code.. one needs to weigh your options before making code change les it should fall under its own weight... Ok, I digressed because even while conducting my own research I chanced to read interesing articles which would give me a peek ino the future about gainfully utlizing concepts from autonomic computing to application monitoring as an aid to debugging applications... but 'Show me the results' was driving me crazy.. but then as the luck would have it a visit to the the local book shop helped me discover "Bitter Java" and as prophecized by "The Alchemist" when you one truly starts liking something the entire universe conspires to make it happen.. seemed so much true .. he "bitter ejb" chapter seems to have been written just for me a quick read with reports immediately helped me identify the 'chatty' interface between our application and the database.. the server frequent roundtrips are consuming far more time over the wire than conducting any productive work... steriotypically such problems occur when we make far more granular requests than is required or may be executing queries in loops... it helps to find initial symptoms of the problem at the macro level.. such an information can be suitably used in other low performing use cases.. bugs come and go, but I'm accumulating debugging skills ... I'm already feeling the need to study "Refactoring: Martin Fowler".. may be I should get back to reading.. more later

Tuesday, March 3, 2009

Performance Monitoring Utility

My current assignment required me to create a light weight utility to conduct application monitoring during production and development phases to identify potential performance bottlenecks. The utility which I created is now ready and is under Beta. Key rationale to create this utility were;

Create an unobtrusive (or minimal) way of monitoring the application, although in its current state it does not uses aspect technologies and thus requires us to manually inject code to those applicaion components which we want to instrument.
The utility should be real lightweight so that it could be used at all times to monitor application performance and system health.
Simple to use. Client code should be able to use the utility in an easy to use way. Figuratively speaking, just place your probes and desired parts of your application and assign a logically comprehend able name to it, that's all!
User should be able set monitoring preferences.
Reporting monitoring results.
Currently the utility monitors the execution times alone but can easily be extended to monitor other metrics like memory, cpu etc.

Design Considerations:

To improve simplicity, reduce verbosity and hide inner working of the monitor code a facade to the monitor library is provided so all hat client code needs to do is;

public SomeObject theMethodUnderInstrumentation(){
try{
MonitorFacade facade = MonitorFactory.getFacade();
if(facade.isMonitorEnabled()){
facade.recordExecutionStart("someLogicalContextName");
}
// do some time consuming task here
}finally{
if(facade.isMonitorEnabled()){
facade.recordExecutionStop("someLogicalContextName");
}
}
}

This is all that is required by the developer to instrument it's code. Simple isn't it!! The above becomes transparent to the developer in case we use aspects to define our pointcuts to inject a similar code as above.

It is a common observation that application performance tuning exercise requires an inter disciplinary approach and much is needed to understand the system performance in the right context. It was therefore important to understand the complete execution path under observation. To make this possible Composite pattern is implemented. The intent of composite is to compose objects into tree structures to represent part-whole hierarchies. The call tree is captured which displays the function execution paths that were traversed in the profiled application. The root of the tree is the entry point into the application or the component. Each function node lists all the functions it called and performance data about those function calls. So to put it crudely composite pattern helped me create the tree structure of application components under observation.

Now that we are ready with the basic infrastructure needed to monitor he code performance at all times the next piece of challenge is to create automated load test and as any one with similar experience will immediately identify that creating automated test cases is not that big a deal but maintaining them over a period of time is.. my next task involves around ways to put intelligence into our load test scripts... I woke up pretty early today .... ;)

Sunday, February 8, 2009

Performance monitoring - creating home grown utilities

One of my current focus area is performance engineering, which although is an umbrella concept in itself pretty much sums the need to achieve the non-funcional requirement defined for the software solution. You may find a neat introduction the concept here. As much as it is important from software engineering perspective, talking of it during design session is looked upon as 'over-engineering' and is grossly ignored saying 'It will be one nice problem to solve, but we are not there yet...' probably, I understand the product manager's need to hit the market ASAP.

As they say, 'Dates in the calendar are closer than they appear', we normally start receiving complaints in form of poor user experience, loss of revenue to the competition blah, blah.. so now we have the 'nice' problem actually waiting for us to solve. Frankly, to start with there are no straight answers or atleast I do not have one. What do we do now? Where do we start? Which are the ideal candidates for performance tuning exercise? Here, is what I would do to track the problem. Place an extremely light weight activity monitoring utility in production at all times because I deem it necessary to even capture the usage of the application by the real life users and not us software engineers enmasked as pseudo users [During one of my earlier assignments, the users of the application would simply close the browser window wih out logging out, because the application didn't contained his private data and just that the users didn' like to wait for the application to log out... resulting in numerous open sessions...]

There are plenty of open source and commercially available tools in the market, still if you feel the need to create a home grown solution for some reason, here is the simple idea that I'm working on now to collect metrics from different parts of the application performance and use it for the quantitative analysis of the application.

Provide a ThreadLocal storage to 'StatisticsCollector' which would provide probes to be kept at key parts of the application to record statistics, to start with I'm only collecting execution times. Create a new object each time a new thread is spawned and keep adding metrics for the components with their metrics at collection points and them to the tree data structure which represents the call hierarchy. Finally, once the response is committed for the thread persist the data and reset thread local variables. I have primarily identified following collection points viz. the servlet filter, servlet do*(), JSPs, transactions, cache hits, jdbc layer while they are less in number they can pretty much indicate the slowest moving layer in my application. This information will form the input o iterative process of profiling-tuning-monitoring to gain performance improvement upon completion of each cycle.