The ETL between OLAP and OLTP

I am an armchair data modeler. That makes me about as dangerous as a cotton ball. Most developers never think about modeling. I'm trained, read-up, and think hard about modeling concepts. Now that some of my hair-brained ideas have made it into production, I have some strong opinions.

OLTP is about data-write performance. Yet, it is typically judged by its performance to read data. That's because 99% of applications running reports against the OLTP model. Sometimes it's from ignorance, but sometimes it from the desire for "real-time" numbers.

The fallacy of real-time is ridiculous. About .0000000001% of applications really require real-time. If NASA can operate a $5B land rover with two hours latency, analysts can handle a 15 or 30 minute lag.

This is where the gap between OLAP and OLTP comes to play.

ETL moves and transforms our OLTP data to our report-friendly OLAP store. But the greater the gap between OLTP and OLAP, the greater the ETL execution cost. The greater that cost, the less often it can be executed. The less often it's executed, the greater the report latency. And so, here we are.

What creates the gap between OLTP and OLAP models? Can you create a good OLTP model that is "nearly" an OLAP model? No. But, yes. And in this post, I am going to share with you one excellent tip that will speed up ETL, increase reporting options, and cost hardly anything.

Let's take a simple OLTP table called users. Here's a typical structure:

CREATE TABLE Users
(
UserId int primary key identity(1, 1),
FirstName varchar(50),
LastName varchar(50)
)

There is nothing wrong with this structure. However, it provides no information to the downstream ETL. As a result, we must build time-based snapshots to understand changes. Instead, a simple schema change can simplify the ETL, shrinking that gap between the OLTP and OLAP models.

Let's take that OLTP table and update its structure:

CREATE TABLE Users
(
UserId int primary key identity(1, 1),
FirstName varchar(50),
LastName varchar(50),
CreatedOn DateTime,
UpdatedOn DateTime

)

With two new columns, the ETL comparison logic can easily identify new rows, updated rows, and quickly scan for deleted rows. I know ETL frequency determines slowly changing dimension granularity. But, things can get too granular. Spread your ETL executions and you improve OLTP performance and make negligible impact on report currency.

When you begin to reduce the OLTP and OLAP gap, you improve ETL performance (and complexity) – and, for the first time, you have the strategic opportunity to increase ETL frequency for high-risk periods of the year. Then, dial it back to something reasonable the rest of the year.

The answer to ETL performance cannot be hardware because databases scale up, not out. Storage, opposed to popular myth, isn't cheap. High-availability, high-performance, secure storage is still expensive to procure, operate, and maintain. That may not change.

The answer to ETL performance is reducing the contention between the OLTP model and the OLAP. You can't do everything, but you can do a lot. I've seen ELT processes reduce from 10 major steps to 8. That's huge. With some more techniques, I think we could see another 20% improvement.

But why?

Until it's fast and easy to move OLTP data to OLAP, businesses will run reports and analysis against the OLTP store. Once a reasonable frequency is established in your environment, reports will perform better (against the OLAP store) and your application will perform better (against the OLTP store).

Two closing remarks

1. CreatedOn and UpdatedOn columns are useless until populated. When the warehouse cannot trust their values, all is lost, and ETL developers will return to their evil ways. Build this into your testing practice, or, like I do, build it into your DAL so you can't forget.

2. ETL itself can negatively impact OLTP performance. I know that. But techniques like this and refusing to calculate anything until you're in the warehouse stage makes it an easy trade off from the myriad reports killing the store when it needs to handle transactions.

Everything that is Microsoft

For those of you wondering what you get with a Team Suite MSDN subscription, here's a peek. Plus, I think it's cool to have list of Everything (95%) that is Microsoft in one list.



Applications (57)

Access 2.0
Access 2003
Access 2007
Accounting
Business Contact Manager
Business ScoreCard Manager
Front Page
Groove 2007
Hyper-V Server 2008
Hyper-V Server 2008 R2
InfoPath 2003
InfoPath 2007
Interconnect 2004
Interconnect 2007
Internet Explorer 6.0
MapPoint 2004
MapPoint 2006
MapPoint 2009
Office 2002
Office 2003
Office 2007
Office 95
Office Communicator 2005
Office Communicator 2007
Office Communicator 2007 R2
Office Communicator Mobile
Office Communicator Web
Office Communicator Web Access
Office Servers 2007
Office XP
OneNote 2003
OneNote 2007
Outlook 2003
Outlook 2007
PerformancePoint Server
ProClarity
Project 2002
Project 2003
Project 2003 Server
Project 2007
Project 95
Project Portfolio Server 2006
Project Portfolio Server 2007
Publisher 2002
Publisher 2003
Publisher 2007
SharePoint Designer 2007
Small Business Accounting(App)
Virtual PC 2004
Virtual PC 2007
Virtual PC for Mac 6.1
Virtual PC for Mac 7.0
Virtual Server 2005
Virtual Server 2005 R2
Visio 2002
Visio 2003
Visio 2007

Business Solutions (23)

Dynamics AX 2009
Dynamics AX 4.0
Dynamics Axapta 3.0
Dynamics CRM 1.0
Dynamics CRM 1.2
Dynamics CRM 3.0
Dynamics CRM 4.0
Dynamics GP 10.0
Dynamics GP 7.5
Dynamics GP 8.0
Dynamics GP 9.0
Dynamics NAV 4.0
Dynamics NAV 5.0
Point of Sale 1.0
Small Business Accounting
Small Business Manager Financials 7.5
Small Business Manager Financials 8.0
Small Business Manager Financials 9.0
Solomon 5.5
Solomon 6.0
Solomon 6.5
Solomon 7.0
Solomon FRx
Designer Tools (2)
Expression 1
Expression 2
Developer Tools (38)
Access Developer Extensions 2003
Automatic Graph Layout
Electronic Learning Libraries
eMbedded Visual C++ 4.0
Macro Assembler 6.11
MDAC
ODBC Data Packs
QuickBasic 4.5
Robotics Studio
SharePoint Services 3.0 Tools
Visual Basic .NET 2003
Visual Basic 2.0
Visual Basic 3.0
Visual Basic 4.0
Visual Basic 6.0
Visual Basic 6.0 Code Advisor
Visual Basic Applications (VBA)
Visual C++ 1.52
Visual C++ 2.0
Visual C++ 4.2
Visual C++ Browser Toolkit
Visual C++ Tools
Visual FoxPro "Sedna"
Visual FoxPro 7.0
Visual FoxPro 8.0
Visual FoxPro 9.0
Visual J#.NET
Visual Modeler
Visual SourceSafe 2005
Visual SourceSafe 6.0d
Visual Studio 2005
Visual Studio 2008
Visual Studio Team System
Visual Studio.NET
Visual Studio.NET 2003
Windows Embedded CE 6.0 R2
Windows XP Embedded
XNA Game Studio

MSDN Library (11)

2001-10 MSDN Library October
2005-07 MSDN Library July
2006-01 MSDN Library January
2006-05 MSDN Library May
2006-06 MSDN Library June
2006-07 MSDN Library July
2006-08 MSDN Library August
2006-12 MSDN Library December
2007-01 MSDN Library January
2007-04 MSDN Library April
2007-06 MSDN Library June

Operating Systems (30)

Compute Cluster Pack
MS-DOS
Small Business Server 2003
Small Business Server 2003 R2
Windows 3.1 (16-bit)
Windows 3.11 (16-bit)
Windows 3.2 (16-bit)
Windows 7
Windows Advanced Server
Windows CE .NET Platform Builder 4.1
Windows CE .NET Platform Builder 4.2
Windows CE 5.0
Windows CE DirectX Kit
Windows CE Embedded 6.0
Windows CE Toolkit Visual C++ 6.0
Windows Essential Business Server 2008
Windows Internet Explorer 7
Windows Internet Explorer 8
Windows Server 2003
Windows Server 2003 R2
Windows Server 2008
Windows Server 2008 R2
Windows Services for UNIX 1.0
Windows Services for UNIX 2.0
Windows Services for UNIX 3.0
Windows Services for UNIX 3.5
Windows Small Business Server 2008
Windows Vista
Windows XP
Workgroups 3.11 (16-bit)

Servers (60)

Antigen
Application Virtualization for Terminal Services
BizTalk Server 2002
BizTalk Server 2004
BizTalk Server 2006
BizTalk Server 2006 R2
BizTalk Server Accelerators and Adapters
Commerce Server 2002
Commerce Server 2007
Connected Services Framework
Content Management Server
Customer Care Framework 2005
Customer Care Framework 2008
Customer Care Framework 2009
Desktop Optimization Pack
Exchange Server 2003
Exchange Server 2007
Forefront Client Security
Forefront Client Security Beta
Forefront Security for Exchange Server
Forefront Security for SharePoint
Forefront Server Security Management Console
Forms Server
Groove Server
Host Integration Server 2000
Host Integration Server 2004
Host Integration Server 2006
Identity Integration Server 2003
Identity Lifecycle Manager 2007
ISA Server 2004
ISA Server 2006
Live Communications Server 2003
Live Communications Server 2005
Mobile Info 2001 Server
Mobile Info 2002 Server
Office Communications Server 2007
Office Communications Server 2007 R2
Operations Manager 2000
Operations Manager 2005
Project Server
Search Server 2008
SharePoint Server 2001
SharePoint Server 2003
SharePoint Server 2007
SharePoint Server 2007 Search
Speech Server 2004
SQL Server 2000
SQL Server 2005
SQL Server 2008
SQL Server 6.5
System Center Capacity Planner
System Center Configuration Manager
System Center Data Protection Manager
System Center Essentials 2007
System Center Mobile Device Manager
System Center Operations Manager
System Center Reporting Manager 2006
System Center Virtual Machine Manager
Systems Management Server 2003
Systems Management Server 2003 R2

Tools and Resources (101)

.NET Compact Framework 1.0
.NET Compact Framework 2.0
.NET Framework 1.0
.NET Framework 1.1
.NET Framework 2.0
.NET Framework 3.5
.NET Framework Redistributable 1.0
.NET Framework Redistributable 2.0
.NET Framework Redistributable 3.0
.NET Micro Framework
16-bit SDK and DDK's
Academic Alliance Tools
Application Compatibility
ASP .NET AJAX 1.0
Bank Framework SDK and DDK's
Baseline Security Analyzer 2.0.1
Bookshelf Symbol 7
Cabinet SDK and DDK's
CMC SDK and DDK's
CRC 3.05
Data Analyzer 2002
Developer Security Resource Kit
Device Simulator
DHTML Editing SDK and DDK's
Digital Dashboard
DirectX SDK and DDK's
Enterprise Instrumentation framework
eScrum
ESP 1.0
FabriKam 3.1
GIF Animator
Healthcare Framework
Help Workshop
IIS
ILMerge
Interix 2.2
Internet Security and Acceleration
ISDN
Log Parser
Mobile Internet Toolkit
Modem Developer Kit
MSDN Online Web Resource
MSXML 4.0
MSYubin7 1.5
Office Resource Kit
Office System
Palm Size PC 1.2
PassPort Manager
Passport SDK and DDK's
Platform SDK and DDK's
Pocket PC 2003
Provisioning System
Repository SDK and DDK's
Security SDK and DDK's
Server Appliance Kit
Services for Netware
SharePoint Products and Technologies
Smart Card
Smart Tag SDK and DDK's
SMS 2003 SDK and DDK's
Soap Toolkit
Solver Foundation
Source Code Migration Tool
Speech SDK and DDK's
SQL 2000 Sybase
SQL Server Migration Assistant
SQL Server Tool (SQLH2)
Standard User Analyzer
Translation Glossaries
UDDI SDK and DDK's
Unified Communications Managed
Virtual Earth
Visio Tools
Visual Basic for Applications
Volume Shadow Copy Service
Web Control SDK and DDK's
Web Services Enhancements
Web Storage SDK and DDK's
Windows Academic Program
Windows Communication Foundation
Windows DNA XML
Windows Hardware Compatibility
Windows Installer
Windows Media
Windows Mobile 2002
Windows Mobile 2003
Windows Mobile 5.0
Windows Mobile 6.0
Windows NT 3.51
Windows Point of Service
Windows Real-Time Communications
Windows Rights Management
Windows SDK and DDK's
Windows Server Update
Windows SharePoint Services
Windows Vista Upgrade Advisor 1.0
Windows Workflow Foundation
WindowsFS
WindowsFX Beta 1
WindowsFX Runtime Components CTP
XML Parser 3.0

Three Architects

In software, "architect" and "engineer" are controversial. They overlap other industries' recognition of formal training and certification. In software, the implication of "formal" training does not exist. Someone may be an "architect" or "engineer" without formal training – indeed without any training.

I can't change the world. The vocabulary ship has already sailed. Nothing is going to unwind architect or engineer from the software lexicon. We have to roll with the punches. And since we are, let's try to establish a little clarity – specifically around architect.

Software architects design solutions, but may also write or develop software like engineers. Likewise, software engineers write or develop software, but may also design solutions like architects. One thing no one ever does is "architect" anything. The word architect is not a verb; design is a verb.

So, why not just call them designers. You see, there is an unfortunate ambiguity in the software industry with designer. Designer is typically understood to be a graphics designer – someone responsible for the look-and-feel of something. Basically, that parking spot is already taken by Mac users.

Let's start.

As I see it, there are three types of software architects. Hopefully, the subsequent text will bring you to the same conclusion. The most important part is: there is not just one type of software architect. That alone might be the most important point.

At its highest level, an architect is responsible for a plan. It is the level or scope of this plan that separates the different types. Although these three roles are discrete, some architects may fulfill two. Very few architects (very very few) can address all three. But, necessity guides the world.

The Enterprise Architect

The Enterprise Architect is the high-level visionary. He doesn't develop software beyond the white board. He thinks in multi-year, multi-platform, and multi-vendor terms. He is sensitive to economic, competitive, and strategic implications. He may have been an engineer once, maybe.

The Enterprise Architect produces the technical vision for an organization. He prefers PowerPoint over Visio, and presents to both technical managers and the Board of Directors. He is responsible for innovation, agility, and practicality. He may be titled "Chief Architect" or CTO.

An Enterprise Architect might say, "We want our products to be Software as a Service."

The Solution Architect

Less likely to wear a tie than the Enterprise Architect, the Solution Architect interacts with teams in practical ways. The Solution Architect (sometimes called Solutions Architect) selects technologies, platforms, and vendors in the context of real software initiatives. He was once an engineer.

The Solution Architect produces project designs for engineers. He prefers Visio over Visual Studio, and remains loosely engaged though a project's lifecycle. He is responsible for project integration, high level implementation consistency, and modernization.

A Solution Architect might say, "We want to expose Product ABC with Web Services."

The Technical Architect

The Technical Architect may be confused with a Lead Developer. A project with more than one team may have more than one Lead Developer. But a project will only have one Technical Architect. He interacts with the team daily. He selects components, patterns, standards. He settles debates.

The Technical Architect produces reference implementations for engineers. He prefers Visual Studio over anything else. He is typically a lead developer. He is responsible for team cohesion, software quality, and the correct implementation of the Solution Architect's design.

A Technical Architect might say, "Product ABC public services will use WCF over SSL."

Lead Developer (I mention him only to clarify Technical Architect.)

The Lead Developer, sometimes called Team Lead, is responsible of team morale, mentoring, code review, and on time delivery. He is a developer, but has the elevated authority necessary to ensure his team's adherence to the Technical Architect's reference implementation.

A Lead Developer might say, "I'll implement encryption, you validate the input parameters."

Wrap it up

Software terms are dangerous because they mean different things to different people. You may or may not agree with my definition of architect. No matter what, this is true: before your project begins, make sure every member has shared definition for titles and understand clearly who is what.



Process Reengineering Primer

In a good economy, organizations invest in process reengineering to promote operational efficiency, competitive advantage, market agility, and to improve day-to-day human factors.

In a struggling economy, organizations must also invest as efficiency, competitiveness, agility, and human factors are more vital to business success than ever.

Reengineering business processes is trickier than it sounds. Having an outsider involved can loosen tensions around problems, and help move the reengineering steps in a positive direction.

Process reengineering has three major steps

First, target vital processes. Every business has them. Some have one, some hundreds. Don't boil the ocean. Select a few critical, related processes to start. Then start.

It's typically easy to know which processes to choose. They are very core to the business, involve the most people, span the longest time, and often are the most painful.

Second, get stakeholder consensus on the process as it is today, the "as-is" model. Usually, this is difficult; most workers loosely agree, but individually customize their part of the process.

The goal is a model which all (or most) stakeholders agree represent today's process – even at a high level. Moreover, they agree to use it. Simply codifying process like this creates efficiencies.

A codified process helps create accountability insofar as task responsibility. It also helps new employees understand their role systematically, rather than organically.

Third, include management to reengineer the "as-is" model into a "to-be" model. Don't shoot for perfection. Clean up glaring flaws and work to improve day-to-day human factors.

What is a day-to-day human factor? These are process areas that drive people crazy. Things like superfluous paperwork, repetition, duplication, and throw-away, legacy busywork.

Models can be high-level or detailed. Err on the side of high-level. "Adaptive discovery" is a concept to implement the process happy path now and slowly fold in edge cases over time (if ever).

The real world

Having a "to-be" model isn't the same as working the "to-be" model. Once you have a better, more-ideal model ready, it's time to create a strategy to put it into everyday life.

Introducing a "to-be" model is not easy. Here are some tips:

  1. Keep it flexible. The "to-be" model isn't perfect, yet. You may need to tweak it on-the-fly; you will disprove assumptions. Have someone authorized to make these changes.
  2. Open doors. Make sure stakeholders have an avenue to give feedback. They are your best measure to success and the fastest path to failure (user acceptance).
  3. Remember the pain. Workers won't like this level of change right away. Make it easier by reminding them of the "as-is" problems addressed by the "to-be" model.
  4. Top down. Getting models mapped is a bottom-up approach; implementing them is top-down. Management must be bought in, and reiterate "this will happen" no matter what.
  5. All at once. Granted, some existing work will need to complete using the old model. However, pick a date. From then on, no new work follows the old model – bring down the hammer.

Use a system

In the software world, Business Process Management (BPM) is the tool family that implements process models, presents tasks, governs adherence, and reports progress. They can be free, expensive, simple, and complex. Whichever you choose, just choose. A BPM can contribute to a "to-be" model's success more than any other factor (other than user acceptance).

Here's where some complexity comes in. An expert should have been used to help reengineer your processes. An expert should be used to integrate a BPM.

I have introduced BMP three ways. As a software architect, I think a lot about how to integrate BPM with systems. To that end, I think these represent the only viable options. They are:

  1. Full Integration. We built a line of business system from scratch. BPM was in its DNA. Using the system was using the BPM if the user knew it or not.
  2. Partial Integration. We extended an existing system. Users interacted with a side-by-side application which also monitored user activity to minimize manual inputs.
  3. No Integration. We implemented a status tracking system for manual processes which could not be otherwise tracked. Users used task lists to interact with the BPM.

Which option you pick will greatly impact user acceptance. Sometimes you only have two choices, sometimes one. You should know your choices, understand the implications, and choose.

Reporting Shangri-La

Workflow nirvana from the perspective of the worker has to do with authority, responsibility, and clarity. But, workflow nirvana from the perspective of management has to do with efficiency, visibility, enforcement, and reports.

Reports in workflow try to answer some common questions:

  1. What is making my process slow?
  2. What could make my process faster?
  3. What if I made a change like XYZ?
  4. How are my employees performing?
  5. What's the current status of ABC?

Here's the problem. Reports are at the end. You need data to see trends. So, many organizations feel reports can be shelved for a later date. They are correct. However, most NEVER return to them and NEVER garner the exponential benefits they offer.

If you read nothing else, read this: the myriad benefits of reengineering business process pale to the benefits delivered through back-end reports. Follow the steps above, but don't stop until you have the reports necessary to answer your specific business questions.

Next, I'll talk about agility with BPM and "continual improvement".