Company > Aricles and Papers

Company

Report Mining with Rover

Three years before becoming Mayor of New York City, Michael Bloomberg wrote a brief yet astute opening editorial in his Bloomberg Personal Finance magazine. In that article, Bloomberg reminded readers of a simple but often overlooked fact: the best technology ideas typically have the fewest bells and whistles. The humorous graphic that accompanied Bloomberg’s article complemented his point quite well. The picture depicted a man wearing virtual reality headgear, connected intricately to a small, complex robotic arm. The arm in turn held…the man's toothbrush. How’s that for progress? Let's just hope a system crash in the bathroom doesn’t force this poor fellow to go to work with bad breath!

Bloomberg quoted a Japanese word for such silly technology overkill: chindogu. It refers to technology that is so complex, so over-specialized; that it could be argued it complicates life as much as it is supposed to improve it (1).

The term chindogu readily applies not only to the world of consumer electronics, but also to the world of business intelligence; the systems and software from which managers and workers get the data they need to do their jobs effectively. Too often, these solutions end up falling into the chindogu category. In many organizations it is still surprisingly difficult for end users to access and work with urgently needed information. Software solutions intended to let end users get their own data prove to be maddeningly complex. Frustrated end users swamp the IT department with requests for custom data. End users also resort to re-keying data from printing thick reports into a spreadsheet for analysis. In fact, workers in many organizations print gigantic reports just to read a few numbers, and then throw the whole report away!

The bottom line: Now more than ever, businesses need greater access to more information more quickly to make decisions. But are they getting it? According to InformationWeek, despite collectively spending billions on data management and business intelligence tools, the answer in many cases is no (2). Instead, the unfortunate end result is that many business intelligence solutions somehow manage to cause more work, both for end users and the IT department.

The challenge to organizations is clear: empower users to easily access information in true self-service fashion, without merely adding an expensive layer of chindogu technology that misses the mark. How to do it is the tough part.

The Possible Avenues for Information Delivery

In this white paper, we’ll look at three common approaches to delivering corporate information today, and introduce a hybrid approach called report mining, which borrows some of the best features from each.

Hardcopy Reports: Still a Staple of Corporate Information

Far and away the most frequently used delivery vehicle for corporate information remains that old standby, the printed "hardcopy" report. Even today, thick reports are printed, reprinted and perhaps even photocopied within virtually every organization, making printed reports one of the main drivers of voracious paper use in businesses worldwide. According to the American Forest & Paper Association, the paper industry distributed 1.7 million tons of office paper in 1982. By 1997, that amount soared to 4.6 million tons. And today, the Worldwatch Institute estimates the average American office worker plows through approximately 12,000 sheets of paper per year (3). Of course, the cost of hardcopy reports to an organization go well beyond just paper, to include substantial printing, shipping and maintenance costs.

Despite major advances in office technology, office paper consumption continues to grow at an astonishing pace. In this age in which open database technology is now taken for granted, one would expect that hardcopy reports would have lost favor years ago. On the contrary, reports are still gaining momentum: an in-depth survey in 2000 revealed office paper consumption in the U.S. grew steadily and unabated year over year, from 1995 through 1999 (4). Arguably, printed reports remain one of the fastest-growing methods of information delivery.

Hardcopy Reports

Pros:
  • An existing informational asset: no new programming work required
Cons:
  • Voracious waste of paper, often loaded with superfluous data
  • Can’t work interactively with the data
  • May not be possible to re-run old reports from past periods

A Hybrid Approach:

Fortunately, most reports are actually quite useful. After all, they embody a tremendous investment of hard work. Many reports qualify as carefully crafted documents which turn “raw” data in multiple data tables into truly valuable information. On rare occasions, a report can be a masterpiece. More often, however, reports contain too much information.

Everything that anybody might possibly need is included; one size fits all. And, of course, all the information is frozen on the printed page. There’s just no way to get at that data, it seems, without digging through mountains of printouts and wasting still more precious time manually rekeying data from reports into spreadsheets or other applications, simply to enable analysis or presentation. Chances are, such manual “information gathering” activities are going on in your own organization, costing you real money and real profits, right now. For all their shortcomings, hardcopy reports still serve a vital need. Day in and day out, they remain the currency of corporate information. Technologies designed to directly access databases, have always promised to supplant printed reports, but more reports are being printed than ever. Why? To answer this question we will now look more closely at direct database access solutions as a means to provide end users with needed data.

Life in the Slow Lane: Using Database Query Tools Against Operational Databases

Using data access directly against operational databases to retrieve information remains a tantalizing concept. Vast deposits of data sit in database tables, just waiting to be tapped. If PC users could be equipped with data mining tools, an information bonanza would soon result. Such were the predictions following the advent of open databases and SQL – Structured Query Language, an industry standard language to query data from open databases. SQL-based applications seek to eliminate the need to print "canned" reports or design complex customized reports by providing direct access to production databases. Vendors offering such applications reason that managers can then request the exact information they need anytime, on an ad hoc basis.

Powerful SQL-based data mining tools have now been around for many years, but, as noted earlier, organizations are printing more reports than ever before. Why? The problem, essentially, is the data often accessed by SQL–based solutions is sitting in operational systems, with its data structured to support fast transaction processing, not queries and analysis.

DATA QUERY TOOLS-Applied to Operational Databases

Pros:
  • Delivers live data to users
  • • Enables interactive analysis
Cons:
  • Frequent or complex data queries slow down operational systems, often dramatically
  • Tools often prove too difficult for end users to work with
  • Potential security issues associated with granting direct access to core databases

Operational Systems versus Informational Systems

Operational systems (also known as production systems) are the systems that help run the enterprise operation day-to-day. These are the backbone systems of any enterprise, our order entry, inventory, manufacturing, payroll and accounting systems (which collectively can also comprise part of a complete ERP system). Additionally, there is also a second type of business system in every organization: Informational systems, which have to do with analyzing data and making decisions, often major decisions, about how the enterprise will operate, now and in the future (5).

Operational systems are designed to quickly and efficiently process and store data from the organization’s day-to-day transactions, not to quickly and efficiently enable analysis of data and making decisions. For example, if a Sales Representative enters a new sales order, the ordering system accepts the new data he or she enters almost immediately. This very fast performance is a direct result of the operational data being highly normalized; this means the data is stored very efficiently within many separate operational database tables, to avoid storing duplicate values and to expedite entry and retrieval of desired transactions.

But the normalized database structure of operational systems presents a significant downside when trying to access data from these systems for analysis and decision making. Constant data-digging upon production (operational) databases often slows applications (6).  In other words, the operational system’s computer processing resources devoted to fulfilling such ad hoc data queries, including “joining” huge volumes of data from many different database tables, will dramatically slow down the system’s ability to do what it was designed to do: process new transactions. As Wayne Eckerson, director of education and research at The Data Warehousing Institute, commented, “Companies don’t want to risk letting a user issue a complex, data-intensive query that brings their production systems – and business – to a standstill” (7).

Manager Trying to Use a SQL Query Tool: "I’d Like to Phone a Friend."

While database query tools generally offer administrator settings to automatically reject a rogue data query that would bring the operational system to a near standstill, that is not likely to address the issue of end user frustration. Formulating proper data queries requires that managers not only avoid asking for “too much” data, but they must also know what database tables and specific data fields they want to extract data from. This is not always clear; the names of data fields available to the end user are often quite numerous, and sometimes seem synonymous. Is the end user selecting the data he or she intended to select? According to InformationWeek, while suppliers of business intelligence applications have been saying for years that their products bring data analysis to front line workers, some say a Ph.D. in statistics and experience in SQL programming are all but required to use the software. (8)

Managers unable or unwilling to figure out how to produce the correct ad hoc data query themselves might tap a data query tool “power user,” often within the IT department, for help. As a result, the IT department still finds itself in the custom report-writing/data query business, using better tools than previous generation products, but still facing the same burgeoning backlog of time-sensitive requests for custom information.

In many cases, direct end user access to data yields unsatisfactory results. Operational systems suffer under the heavy load of query processing, managers spend too much time simply trying to build the correct query for the information they need, and the data query burden often still falls on the IT department. “Because of the inherent differences between operational and informational systems,” says Eckerson, “many companies choose to build separate databases to house decision support data.” (9) These separate information-oriented databases are called data warehouses, or their smaller cousins, data marts.

Welcome To The Data Warehouse…Hardhats Required

To recap the promised benefits of high-end database analytical solutions, companies usually need a data warehouse, a database whose sole purpose is to serve up presorted data, with a price tag, according to Forbes magazine, that can run up to $2 million or more (10). Indeed, a data warehouse is a separate “place” for corporate data, designed to serve as an optimized informational system. A data warehouse is intended to provide access to data for analysis and decision making, without impacting operational systems.

To start building the data warehouse, “raw” data is extracted from the numerous tables within operational databases at pre-determined intervals. Data from these many tables are frequently combined into one or perhaps just a few monolithic tables, a process known as denormalization. A key requirement to properly combine data from different tables together is a well-planned Extraction, Transformation and Loading process (referred to as “ETL”). Proper ETL guarantees that data from different operational data tables, and perhaps completely different operational systems, is combined together accurately. ETL is the most complex and long-lasting part of a data warehousing project, and also the most error-prone.

DATA QUERY TOOLS-Applied to a Data Warehouse or Mart

Pros:
  • Delivers live data to users
  • Enables interactive analysis
  • Avoids adversely affecting operational system performance
Cons:
  • Data warehouse/mart creation very expensive, even if done right
  • High risk of total project failure
  • End user tools remain too complex

For large global corporations with large centralized IT departments and large capital expense budgets, the gigantic price tag of a successfully implemented data warehouse, bolstered by a successful ETL process makes great business sense. Such organizations can yield a compelling return on investment (ROI) on a data warehouse despite the high costs. Examples include using a data warehouse to analyze perhaps multi-terabytes of detailed customer spending data, to produce targeted, customized special offers, or using a data warehouse to reliably predict production and sales patterns, cutting excess inventory worldwide.

Obviously, few organizations can afford the Herculean financial and man-hour demands of a fullblown data warehouse, nor do they have, or want to manage, multi-terabytes of data. However, many technologists will argue the data warehouse concept can be “right-sized” to meet the needs and budget limitations of midsize businesses, by creating smaller, tactical stores of data, referred to as data marts. Organizations considering such an approach as a solution to getting needed data to their users must nevertheless recognize the risks associated with a vast data warehouse still lurk when building a smaller data mart.

Beware the “Data Outhouse”

First, the data warehouse landscape, even at businesses of Fortune 500 proportions, is marked with colossal failures. Often such failure can be blamed on a faulty ETL process: data from different systems is mapped together improperly; some needed data is not included into the data warehouse at all; or just plain wrong data is included into the data warehouse due to miscommunication between end users and IT personnel. The result: a “data outhouse,” a label of failure applied to a data warehouse with “dirty,” or incorrect and therefore worthless, data stored inside.

Another key problem with data warehouses or data marts is the fact that the data going into the warehouse or mart is, as previously discussed, that “low grade ore” variety of data residing in numerous data tables within operational systems. This “raw” data generally does not yet have any integrity, because business rules have not yet been applied to the data. Business rules might include certain calculations to make the “raw” data useful and correct for analysis and decision making. These business rules must be completely understood by data warehousing builders, with necessary calculations performed on the data as part of the ETL process, before loading it into the data warehouse or mart. Failure to do this will most certainly result in a data outhouse. “A word to the wise: Never assume that (operational) source data has integrity,” warned DM Review columnist Gary Clark. (11)

Another challenge is making sure the data the organization really needs ends up in the data warehouse or mart. As data warehousing specialist and author Nancy Mullen wrote, placing “all” enterprise data into a monolithic data warehouse, without little or no regard for what is truly needed and what is superfluous, leads to “junk drawer” mentality: the resulting data warehouse looks like Report Mining: An easier way to access corporate information an overstuffed drawer. Users may feel overwhelmed and find the whole warehouse unusable.(12)

So far this paper has addressed just a few key challenges associated only with “getting data into” a data warehouse properly. Many more potential “gotchas” lurk elsewhere in the entire data warehouse/data mart process, such as challenges associated with proper maintenance of data already in the data warehouse (i.e., data corrections and updates), to challenges associated with effectively “getting the data out” to end users (i.e., selection of user-friendly and efficient end user analytical tools – as previously noted, business intelligence tools are often still too complex for nontechnical people to use; (13) proper end user training; and numerous additional steps needed to ensure the data warehouse/mart is actually used!).

However, further elaboration of the potential pitfalls of data warehousing/data marts may be unnecessary for many midsize organizations, and even some large organizations. Data warehousing/data mart solutions may fall into that chindogu category of technology for these organizations, simply because the existing business system infrastructure is already relatively simple.

According to author Larry Greenfield, who operates the Data Warehousing Information Center website: If any of these conditions apply to an organization’s business needs:

  • You need to report on data in one transaction processing system, and/or
  • The historical data you need are in that system, and/or
  • The data in the system are clean, and/or
  • The structure of the system data is relatively simple, and/or
  • Your firm does not have much interest in end user ad hoc query/report tools, then data warehousing may not be for your business.

Greenfield adds, “Once you get away from the big ‘Fortune 500, centralized IS’-type shops most of the data warehousing vendors slant their marketing to, these conditions describe the reporting needs of many firms.” (14)

Putting it All Together: The Report Mining Alternative

Meanwhile, paper reports continue to proliferate. Consider this intriguing quote from a management accounting team leader who, with a major enterprise data warehousing project in place, said: “Paper reports are still floating around, and I'd love to pilfer through (end user) desks and see what information they're actually looking at.” (15) And consider recent comments from TruServe CIO Neil Hastie, who, when asked by InformationWeek how many companies today make business decisions, responded: “A lot of byguess and by-golly, a lot of by-gut, and a whole lot of paper reports.” (16)

Again and again, we see executives, managers and workers, even today, still often resorting to printed reports for information, despite huge efforts to provide direct access to data, either in operational systems or new data warehouses and marts. Yet, even after all the new business intelligence technology introduced over the past ten years or more, one fact is undeniable: Printed reports steadfastly refuse to leave the business landscape. End users still cannot do without them!

This fact leads us to some key questions:

  • Given that reports are so pervasive, why not try to gain some brand new value from them?
  • Instead of merely printing reports, what if the tried and true data buried within the report files themselves could be transformed into live data?
  • What if end users could easily access the data within report files, and then easily sort, filter, analyze and export the data, without programming?

Imagine if users could easily obtain the data they need using existing report files as a “proxy” for the data within the company’s operational systems!

As Bob Moran of Aberdeen Group recently commented, “One of the biggest values of business intelligence tools is their ability to help business users to do their jobs better – a value that the majority of users in most organizations could attain if they could quickly find and use the information already contained in their organization's reports.”

The idea is actually quite simple: instead of mining data from a central database, users mine data buried in reports. Any report used in your organization can be run not to a printer, but instead run to a report (print) file and mined. Barr Systems’ latest product, Rover is a report mining solution that makes transformation of existing reports into live data possible.

Using report mining tools such as Rover, users of any IT technology skill level can mine the massive amounts of data already captured in the organization’s existing reports rather than trying to get the information out of production systems or waiting for deployment of data warehouses. In fact, no additional programming work of any kind is required to get actionable data into the hands of managers.

Using report files as a source of data offers many advantages: Data is instantly available to end users, with no programming, data queries or data mart preparation. Every report produced by an organization now represents a ready-made “minidatabase” that report mining tools can exploit. These reports contain data already retrieved from multiple tables from the operational system, complete with any relevant business rules already reflected in the final data presented in the report. “The real value of report mining tools,” says Gartner Group analyst Howard Dresner, “comes from their ability to promote access and analysis of report information directly, without an intermediary transformation process.”

Existing reports are fully leveraged. Report mining tools leverage the reports already available in the organization, including “canned” reports already offered as part of existing, paid-for operational systems, such as ERP, Supply Chain, industry-specific solutions, etc., as well as additional reports already developed by IT. Report mining tools therefore bring an existing, previously untapped, and, in effect free, source of informational data to life.

Users are immediately productive. The concept of extracting data out of familiar reports makes intuitive sense to business users, and best enables them to utilize their business knowledge immediately. Instead of seeking to replace existing printed reports, Rover eliminates the long-standing problems associated with printed reports, by transforming reports from static, inflexible documents to a new source of live data, that can be effectively used for analysis and decision support. Rover also leverages the long-standing advantage of printed reports, specifically, that they already exist and can be produced with NO additional outlay of IT funds or man-hours! This is the key focus of report mining, the process of extracting, parsing and analyzing data within existing reports.

For example, report mining tools can fully leverage:

  • the Accounting Manager’s knowledge of the company’s General Ledger and other accounting report content
  • the Operations Analyst’s knowledge of the company’s Inventory Management report content
  • the Payroll Director’s knowledge of the Payroll Register content

Empowered with Rover, these business users are best able to utilize their business knowledge to produce, on their own, the data they know they need to effectively manage the organization. Users can create their own custom views and analysis of data, in true self-service fashion. Effective report mining tools not only serve as a means to convert reports into data, but also facilitate interactive analysis of that newly created data by the end user, such as:

  • Sort and filter data
  • Add new calculated fields of data, using formulas and functions
  • Produce summaries of data, with subtotals, grand totals, and automatic graphs
  • Be able to open multiple copies of the same report in one report mining session for trending and other analysis (such as analyzing three monthly reports at the same time to perform a quarterly analysis)
  • Work with data from reports, alone or in combination with data from another source, such as a spreadsheet, database file, ODBC data source, etc.
  • Export desired data to MS Excel, Access, and other applications anytime

Such analysis, driven by the end-user, is often easier for end users to perform using report mining tools, because the analysis begins with a starting point that all managers and workers already understand – the printed reports they are already familiar with. As a result, report mining tools offer an inherently faster learning curve than database-focused business intelligence solutions.

Report mining can be utilized in virtually any computing environment. Because plain ASCII “line printer” text is a “lowest common denominator” standard of output offered by virtually every operational system, no matter how new or old the system is, report mining tools can be utilized in virtually any computing environment. The report mining tool does not need to “know” what databases or systems the reports came from. The only requirement is the existence of a report file.

Workload demands upon IT are reduced. Widespread end user utilization of report mining tools means significantly fewer demands placed on IT to produce custom reports or data queries, because users can create new views of data sourced from existing reports. Report mining tools work immediately, are easy to use, provide immediate value for end users, and reduce the workload of IT.

Report and Data Query Demands on Operational Systems are reduced. Since report mining tools can produce a wide variety of custom views from a single report, fewer different end user requests for the same report or data query will be run, reducing CPU Cycles and other resource drains on the organization’s operational systems.

Data security is maintained. Since report mining tools read report files and not the central database, operational data stays secure and out-of-reach. Extensive use of report mining tools will notably reduce company paper and printing costs.

With widespread use of report mining tools, reports are much more valuable in their “softcopy” format (i.e., as a report file that can be converted into data) than they if they are printed on paper. As a result, organizations that use Rover can’t help but realize lower report paper and printing expenditures on an ongoing basis.

Real-World Experience: Report Mining in Action 

We will review how Rover can be used to enable web based report mining solutions for the enterprise.When you implement Rover, there will be no need to waste time flipping through hardcopies of reports, or waste time trying to gain some understanding of the underlying database structure. You will not need to spend substantial time rekeying data from hardcopy reports into a spreadsheet, just to initiate rudimentary analysis.You can utilize Rover to transform standard existing McKesson/HBOC SERIES reports into live data, allowing, for example, easy drill-down analysis of accounts receivable information on patient accounts. Rover can eliminate the programming middleman, and put the power in the hands of the end-user of the information.

Organizations with multiple users, and multiple information sources, will likely find it highly advantageous to store report files into a Report Management system, thus building a searchable electronic archive of indexed reports. Such a report archive should offer easy access/distribution to end users via the web, with full security. Such web-enabled report management solutions offer very significant paper and printing related savings, as well as offer a welcome alternative to such antiquated archive technologies such as microfiche or COLD systems.

Recognizing this, Barr Systems has expanded its report mining technology to the enterprise level with Rover, an enterprise solution offering, among other features, Report Management combined with web-based Business Intelligence, transforming report-based information into live HTML data, which can be analyzed online, downloaded to Excel, or even loaded into a web-based multidimensional cube for graphical analysis from any level of detail.

Rover can fulfill informational needs across an organization, while fully leveraging existing operational systems such as ERP. Rover can repurpose existing ERP reports while adding value to them. These benefits can be realized by organizations running not only ERP and Supply Chain solutions, but also healthcare, banking/brokerage, retail and other industry-specific operational systems.

WEB-BASED REPORT MINING

Pros:
  • Distributes reports across the enterprise, plus data for interactive analysis, without programming
  • Slashes requests for custom data
Cons:
  • Unlikely to replace need for data warehouses at larger organizations (but can serve as an alternative to data marts)
  • Most “real time” data analysis still calls for SQL data query tools

Intuitive, user-friendly, flexible, consistent, and rapid ROI are attributes that most enterprise managers would like to apply to their IT strategies. Report mining technologies fit this model well. Report mining solutions require no reengineering of existing reports or data, is a technology easily understood by end users and IT personnel alike, and, because they utilize existing enterprise report output, fully complement whatever operational systems have already been put in place. These key attributes of report mining mean a high likeliness of business success and a low risk of failure; a very promising risk/reward ratio that few other business intelligence solutions can promise.

Endnotes:

  1. Michael R. Bloomberg, “Simply Irresistible,” Bloomberg Personal, April 1998, p. 16.
  2. Rick Whiting, “Analysis Gap,” InformationWeek, April 22, 2002.
  3. Dan Ruben, “Is your office paper stacked against you?” Environmental News Network, November 13, 2001.
  4. “In the United States, It's Official: People Love Paper,” PaperCom Alliance Index Report, November 27, 2000.
  5. The Ken Orr Institute, “Data Warehousing Technology: A White Paper by Ken Orr,” 1996, Revised Edition, 2000.
  6. Erika Brown, “Analyze This,” Forbes, April 1, 2002.
  7. Wayne Eckerson, “Data Warehouses; Product Requirements, Architectures and Implementation Strategies,” Patricia Seybold Group, August 1994.
  8. Rick Whiting.
  9. Wayne Eckerson.
  10. Erika Brown.
  11. Gary Clark, “Strategies to Solutions: The Rules of Data Warehousing,” DM Review, February 1998.
  12. Nancy Mullen, “Enterprise Data: What's Useful and What's Junk?” DM Review, January 2002.
  13. Rick Whiting.
  14. Larry Greenfield, “The Case Against Data Warehousing,” The Data Warehousing Information Center (dwinfocenter.org), Updated April 12, 2002.
  15. Sharon Gaudin, “Data Mart Dynamics,” Computerworld, September 22, 1997, p. 99.
  16. Rick Whiting.

For more information, send email to info@barrsystems.com.

Home | Applications | Products | Support
Services | Contact Us | The Company | Search
Copyright © 2007 Barr Systems, LLC Terms of Use | Privacy

 

Home      Contact      Search