Blog

Commercial vs. Open-source Data Quality Solutions

Data-Quality-Solutions

In this latest entry of DQM blog series, I will share a comparison of various open-source and commercial DQM Tools, based on my research and experiences with them.

As discussed in previous blogs, DQM is a key factor in success of any enterprise information management system. Without Data Quality Management, Big data is just a pile of data which cannot deliver real benefits to organization. So, here comes the role of various tools that can be effectively used to ensure data quality. Today, there are number of DQM tools in market to choose from. Selection of tool depends on number of factors listed below…

Selection Criteria:

  • Cost –  commercial or open-source
  • Web based or Desktop based
  • Operating system support
  • Data processing capabilities
  • Data-source types that need to be connected
  • Data formats types that need to be processed
  • Data mapping and Data validation rules
  • Load testing and Error handling capabilities
  • Logging and Reporting features
  • Ease of use and Learning curve
  • Support

Some of popular data integration and quality assurance tools available in market are listed below;

 

Sr.

Name

Type

1 IBM DataStage Commercial
2 Informatica Power Center Commercial
3 Talend  Data Quality Suite Open source
4 Pentaho Kettle Open source
5 CloverETL Open source

Commercial vs. Open-source Solutions:

“IBM DataStage” and “Informatica Power Center” are examples of Data Quality Commercial solutions that have extensive ability to handle very large data volumes in complex and heterogeneous environments. These products provide comprehensive features and functionality and therefore also require extensive training to use effectively. Considering the cost and the effort required in implementing these solutions, they are best suited for very large, complex and enterprise-wide systems.

On the other side of the market are the Open Source solutions, which have matured into viable technology alternatives. Talend, Pentaho and CloverETL are examples of open-source solutions available in this category. These solutions come in free as well as paid editions. Free editions are good enough for performing basic to medium level data quality functions. The paid versions of these tools offer some advanced features and customer support in addition to basic features.

These solutions are perfect mid to large organizations, and organizations can take data quality initiatives without investing much in earlier phases. If the requirements grow data quality teams always have the option to do the customization in open-source solution or can move on to their licensed versions.

Comparison Factors

Open-source / Free

Commercial

Cost

Free

Paid

Large volume data handling

Yes

Yes

Data mapping & Validation

Yes

Yes

Plugins

Yes

Yes

Complex Lookups

Yes

Yes (more options)

Data source/type compatibility

Yes

Yes

Reporting & documentation features

Limited

Yes

Ease of use

Open-source tools like Talend and Pentaho have matured a lot in past years and are lot easier to use now

Since, Popular commercial tools have lot of features, so it requires some training and time to get complete grasp of these tools

Support

Community only

Full support available