Friday, June 15, 2012

SBAs - Search Based Applications



Googling for the above term (as of this day) does not returns quality results. However, if you search for Search Based Applications (SBAs) you will hit a wikipedia entry and certain other articles which have used the similar term. You will even find a book by the same name on Amazon. To sum, it simply means that my thought process is cutting edge ;) ...come on, this is my blog, I can brag!

But, I'm not kidding, the idea in itself is very simple... just use right data-structure(inverted indexes) for read-intensive jobs and it dramatically simplifies your application architecture and brings God sent agility to your deliveries. Let me try to explain that...

In traditional database-centered-applications data stored in relational databases can only be retrieved through SQLs, which is a good thing as it standardizes the information access. However, let us try to contrast them with SBAs;

  • If the desired information is too complicated to derive from the existing data, think query complexity, frequency for the requested information, scalability, while SBAs can process both structured and natural language queries and then provide faceted navigation which allows users to slice-n-dice data in innumerable ways.
  • If the needed query does not exists and as such users will need to wait for the completion of next sprint while in the other case if you have indexed the data, the information is mostly available. Think fast time to market, high agility!
  • Forget mulitple tables, think about a scenario where in you need collate information from multiple data sources read emails, blog entry, cvs, pdf, office documents, simply put if you can groke text SBAs can digest them to seek intelligent information later.


As if it was not enough motivation for us to experiment more with SBAs here is some more;

  • SBAs are more fault tolerant to user queries and can make spelling suggestions, return results for synonymous queries
  • You can integrate your SBA with some statistical based system to identify popular entries or make recommendations using clustering and collaborative filtering
  • You can even filter results which are not suitable for user think access permissions, adult content, exclusively for club members


Mind you all this talk is not just about 'search text box' generated results the usage can be more implicit based on user's context which may be driven by application specific use cases like content browsing.

Today I've just scratched the surface, more later..

Do let me know your queries and feedback.


Friday, June 8, 2012

REALSEX

REALSEX a.k.a REAL eState EXchange

Oh! so you finally ended up being inquisitive.. You Peeping Tom!! Never mind, I was just experimenting with the idea of putting some thing scandalous as the blog title to attract readers.. Gotch Ya! :)

This blog entry is actually about my new found big ticket Startup !dea. Well for those of you who know me might frown.. isn't it? You see for past few days I'm observing that there is rising trend of this particular keyword being thrown upon me from all sides to such an extent that if you ask me to summarize my day today in one word I would say "START-UP".

Allow me to explain, I learnt a tonne about start-up, venture capitalist, incubation, mentor, angel investors at break-fast and then later in the day, all the way to my office I came to learn about Karnataka Government's initiative to bring fresh investments in the state through smiling faces of our CM on roadside-makeshift-hoardings, as if that was not enough my kiddo-colleague declared his next job title is going to be CEO of his new found organization. Wow!

Well I'm trying to scratch the personal itch of identifying a spot for constructing my house in Bangalore, but I discovered that not only it is painstakingly difficult identify properties which are up for sale it is all the more difficult to verify and validate all the meta-data (read ownership, water source, sanitation, future prospects etc.) associated with the property to ascertain its purchase worthiness. One would have to run pillar to post to collect minimal data (read land records) to seek loan from banks however other information as I mentioned above is at best adhoc in nature and is definitely not available at one window.

Real estate portals in its current form cater to information need for availability of properties and related news about Govt. policies (taxation etc) and proposals (read upcoming metro or an IT park) which might affect the property prices in the region. However, it is all left to the user to make intelligent decisions without really providing an avenue to collate this information in a digestible format, so that it becomes difficult for dummies like me to correlate all the information and generate some sort of index which I may use as pointers to make purchase decision say should I go Bangalore North because future lies with 'Aerotropolis' or should I go South because "Kanakpura Road is the new Jaynagar!!"

Would it not be better if we had a single-window-trusted-source-of-information where in we can seek answers to our queries?!

Let us now think about developing one such user platform in a phased manner;

  • Aggregate data available from dependable public sources like rainfall patterns, vegetation, soil type, depth of water table, civic amenities and put them on map after geo-encoding.
  • Create 'event-listeners' to collect data from events like news feeds, user forums, govt plans and establish intelligent correlation with above mentioned slow changing data to determine actionable information. For e.g. Govt. approved a new metro station at 'X' now if you overlap this information water availability in the area on a map it becomes pretty useful isn't it!
  • Create property index. 
  • Rank them for different purposes and ensure environmental sustainability in your recommendations.
  • Cultivate a community of users, information seekers, who would like to learn from your applications' knowledge-base.
  • Create avenues for these user's to provide feedback/contribute to the community. Gamify it.
  • Allow users to publish/search individual property.
  • Motivate users to provide meta-data, like do you have rain-water harvesting in your premises, give them more points for that .. improves value on property index.
  • Suggest corrective actions to improve property ranking, may trigger improvements at a macro -level. Like if citizens in a particular locality identify that improving cleanliness in their locality will fetch better returns for their investments, might as well improve general hygiene in the entire locality.
  • ...
Simply put the platform is required to ensure high quality governance of the eco-system so created.

Now the impatient among you might already want to return from the inception point and see money. OK, let me try my hands in explaining the business plan as well.
  • Free for all, always!
  • Multilevel premium charges for activities like identifying (organic recommendation i.e. no promotions) potential properties, generating reports based on fact (advisories) on identified properties etc.
  • Advertisements
    • Service providers say plumbers or builders
    • Classifieds
Was it too big for an elevator pitch? Nevertheless, it feels good to see light at the end of the tunnel. I feel like God! .. err I must sleep now as I have a job to keep tomorrow in lieu of 'Bharat Bandh'

Do share your feedback, because it is important.

Thursday, June 7, 2012

Mongodb Paging Item Reader


Here, is a sample..

package org.zero.index.batch.read;

import java.util.List;
import java.util.concurrent.CopyOnWriteArrayList;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.item.database.AbstractPagingItemReader;
import org.springframework.data.mongodb.core.MongoOperations;
import org.springframework.util.Assert;

public class MongodbPagingItemReader<T> extends AbstractPagingItemReader<T> {
  private MongoOperations mongoOperations;
  private QueryContext<T> queryContext;
  private boolean strict;

  private static final Logger LOG = LoggerFactory
      .getLogger(MongodbPagingItemReader.class);

  @Override
  protected void doReadPage() {
    if (results == null) {
      results = new CopyOnWriteArrayList<T>();
    } else {
      results.clear();
    }
    final List<T> result = mongoOperations.find(
        queryContext.getQuery().limit(getPageSize()).skip(getPage()),
        queryContext.getEntityClassName(), queryContext.getCollectionName());
    if ((getPage() == 0) && result.isEmpty()) {
      if (strict) {
        throw new IllegalStateException(
            "No matching documents found (reader is in 'strict' mode)");
      }
      LOG.warn("No matching documents found");
      return;
    }
    results.addAll(result);
  }

  @Override
  protected void doJumpToPage(final int itemIndex) {
    // do nothing
  }

  /**
   * @param mongoOperations
   *          the mongoOperations to set
   */
  public void setMongoOperations(final MongoOperations mongoOperations) {
    this.mongoOperations = mongoOperations;
  }

  /**
   * @param queryContext
   *          the queryContext to set
   */
  public void setQueryContext(final QueryContext<T> queryContext) {
    this.queryContext = queryContext;
  }

  /**
   * @param strict
   */
  public void setStrict(final boolean strict) {
    this.strict = strict;
  }

  @Override
  public void afterPropertiesSet() throws Exception {
    super.afterPropertiesSet();
    Assert.notNull(mongoOperations, "mongoOperations is null");
    Assert.notNull(queryContext, "queryContext is null");
  }
}