Data Migration
Data Migration
According to the Wikipedia, data migration is        changed in the process. For the same reason,        via change data capture – does not involve
“the transferring of data between storage types,     if you move your applications from an Oracle        data migration.
formats or computer systems… it is required          database to MySQL, say, but retain your
when organizations or individuals change             existing applications (and therefore, the same    The first conclusion we can draw from this is
computer systems or upgrade to new systems.”         schema) then that is a data movement              that data migration never exists in isolation: it
According to the Webopedia it is 1) the process of   exercise and not one of data migration.           is always linked to applications in some way,
translating data from one format to another…                                                           whether that be an application migration,
necessary when an organization decides to use a      A further consideration is with respect to        implementation or upgrade. This is one reason
new computing system or database management          conventional ETL (extract, transform and load)    why there is no such thing as a data migration
system that is incompatible with the current         processes. Under both of the definitions          discipline per se, because it is always a subset
system and/or 2) the process of moving data          quoted above, ETL, at least as it relates to      of some broader effort. However, this may be
from one storage device to another, in which         loading data into a data warehouse, would be      a reason but it is not an excuse: the use of ETL
case data migration is the same as Hierarchical      considered a form of data migration. Certainly    is part of a broader discipline but it does have
Storage Management (HSM).”                           some of the same techniques might be              its own discipline area.
                                                     employed but we do not believe that this is
In part because of the qualifications (“it is        how data migration is commonly understood.        Getting back specifically to the question of
required…”, “necessary when…”) that form both        So, we need a definition of data migration that   how we define data migration, the second
of these definitions, neither is correct. For        excludes both pure data movement and the          conclusion we can draw from the use cases
example, the Wikipedia definition implies that       sort of ETL processes that are used to load a     above is that the usage of the data, if by that
if you upgrade your system then you will need        data warehouse.                                   we mean whether the data is used for
to migrate your data, which isn’t true.                                                                operational or analytic purposes, remains
Similarly, you might infer from the Webopedia        Finally, a further distinction must be made       constant. With data migration you never
definition that compatibility is a key issue.        between data migration and data integration       migrate from an operational environment to
Indeed it is but this definition might suggest       more generally. The former is a one-time          an analytic (business intelligence)
that existing systems are compatible when            exercise whereas most other forms of data         environment or vice versa.
they are not.                                        integration, such as loading a data warehouse,
                                                     are on-going.                                     Note that we are defining usage here as
Migration literally means movement from one                                                            distinct from context. Loading data into a
place to another. However, if you were to move       It is worth considering a number of common        warehouse changes the context in which it is
from Boston to New York or from Manchester           use cases to see whether these actually           used, as does migrating data from an ERP
to London one would not normally describe            involve data migration or not:                    system into a CRM system. However, not all
that as migration. On the other hand, if you                                                           data migrations change context. For example,
moved (on a permanent basis) from Boston to          •	 Moving from one or more databases or           the consolidation of heterogeneous CRM
Paris then you would be (e)migrating from              storage types to another, with all other        systems does not change the context of the
your perspective or (im)migrating from the             things remaining constant – not data            data: it is still CRM.
point of view of a French person. The                  migration.
difference is that you are moving from one                                                             Our definition is therefore as follows:
cultural, linguistic and political milieu to         •	 Loading a data warehouse – not data
another, which will involve significant change.        migration.                                       “Data migration is any movement of persistent
This is the essential difference between                                                                data that involves some sort of restructuring of
moving and migrating: the latter involves            •	 Archival, where older or less used                   that data while the usage of that data
major change while the former does not.                information is moved to secondary storage            (operational versus analytical) remains
                                                       – not data migration.                                               constant.”
To take a simple example, if you wish to
consolidate multiple (Oracle) Siebel CRM             •	 Migrating from one application to another      Note that you could optionally replace
systems to a single instance of Siebel, and all        (such as Oracle to SAP) – will require data     “movement of persistent data” in this definition
of those systems use the same version of that          migration.                                      with “one-time movement of data”. Either of
software (and the systems have not been                                                                these qualifiers will exclude the sort of data
customised, which admittedly is unlikely), then      •	 Implementing a new application where           integration that involves the transformation of
this would represent a data movement task              (some of) the data for the new application      things such as SWIFT and EDI messages,
and not data migration. Conversely,                    already exists within existing applications     which is an on-going process for what are
consolidating multiple, heterogeneous CRM              – will require data migration.                  essentially transitory objects (though these
systems into one new one would data require                                                            may be persisted this is done for auditing
data migration. The difference is that the           •	 Upgrading from one version of an               reasons rather than operational processing; in
structure of the data will be different across         application to another – may require data       other words the usage would change).
heterogeneous systems and that data will               migration, if data formats change.
need to be altered in some way (including                                                              It is also arguable that there are two classes
format and style changes, semantic                   •	 Implementing master data management –          of data migration: simple and complex, where
reconciliation, de-duplication, perhaps just           may require data migration, if there is a       the former retains the same context (that is,
that the data is now used differently, and so          physical movement of the data into, say, a      the same application type) while the latter
on) in order to fit the demands of the new             hub. In other circumstances, procedures         involves a new context (a new application type
system.                                                similar to conventional uses of ETL may be      or, perhaps, MDM). While we will not be
                                                       needed.                                         pursuing this idea further in this paper it is
Similarly, moving data from one storage                                                                worth bearing in mind this additional
device to another, for example, is data              •	 The capture, transformation and pass-          complexity.
movement: data migration would only be                 through of data taken from message
involved when the format of the data has to be         queues; EDI, SWIFT or HIPAA messages; or
Data Migration                                                                                                                                      page 3
The perception of data migration
It is our belief that a major part of the reason       kudos if you succeed. Secondly, it is regarded      Using external resources
behind the failure of so many migration                as boring. That’s a matter of opinion. Thirdly,
projects to meet target and budget is because          it is a short term job. You are dedicated to the    Recognising the relative importance of data
of the lack of regard for the expertise and            task for 6 or 12 months but what are you            migration by an individual is one thing but
processes required for data migration. In this         qualified to do afterwards? Fourth, your team       changing the culture of an organisation is
section we will examine the reasons behind             will be seconded out to the applications guys       another. It may well be that members of the
this in more detail.                                   whenever there is a problem over delivery on        data management team recognise the
                                                       the software side (because those are the guys       importance of these arguments as will,
The implication of applications                        that make the decisions, remember, and              perhaps, the CIO and some others but if the IT
                                                       therefore what they are doing is more               department as a whole is not convinced then
Data migration is invariably a function                important), which means that you can’t              problems will persist. In such environments it
associated with applications. More precisely,          properly plan your part of the project. Fifth, as   may be better to consider the use of external
data migration is always a subset of some              soon as the project is finished you will be out     specialists for data migration rather than use
overarching application project. The most              of a job, which is hardly an encouragement to       in-house resources. While consultants
important implication of this is that it is always     finish on time. Sixth, if the greater project is    approach these matters in different ways
those who are making decisions about the               an application migration project (from Oracle       (some will not touch data migration), the
application(s) that are in control of the project.     to SAP say) then you will be working mostly         approach that we would recommend would be
Unfortunately, such individuals tend not to be         with the Oracle experts who themselves will         one where the external data migration
experts in dealing with data: focusing on              be out of a job when the project is done.           specialist charges a fixed fee for an initial
processes and interfaces rather than data              Finally, such projects can often become             evaluation after which they will provide a fixed
semantics. Often these people will not                 embroiled in what seem like intractable             price quotation for doing the job as a whole. It
appreciate the scope of any data issues that           political struggles and why would you want the      will be an advantage if the consultant can
may be involved in the project and they will           stress?                                             provide hosting services on a temporary basis
frequently underestimate what is needed in                                                                 during the cut-over process (but depending on
terms of time, budget and personnel in order           Needless to say, everybody in the IT                the data migration strategy adopted—see
to manage those issues. This is exacerbated            department knows all of this (at least              later).
by the fact that data migration is not                 viscerally if not in detail). As a result no-one
recognised as a discipline in its own right            wants to work on the data migration team.           Note that data migration consultants can only
within data management so there are few, if            This in turn means that, in many cases, it is       do what they are asked to do. Give them the
any, in-company experts who can advise on              the less experienced people and the staff with      data and tell them what transformations are
such matters. Moreover, it is commonly the             less political clout that get the data migration    required and they should be able to migrate it
case that systems integrators will separate            jobs. So, they are even less well placed to         for you. However, as we shall see in the next
the application migration aspects of a project         ensure that the necessary resources are             section, herein lies the rub: you often don’t
from the data migration component (leaving             available to the team. Is it surprising that so     know where all the data is or how you will
this to the customer), ostensibly because they         many application migrations and upgrades run        need to transform it.
do not understand the data as well as their            over time and budget? For very many
client but, at least in part, because they             companies this is almost inevitable before the      One final point—as we discuss later, data
recognise its difficulty.                              project even begins, because not enough             migration is essentially a business issue—this
                                                       emphasis is placed on data migration.               means that whatever external resources you
What all of this means is that, if not actually                                                            use, you should not give them ownership of
an afterthought, the issues arising from data          We do not think we are saying anything new          the project.
migration are not given the full consideration         here. However, the question that then arises is
they need in many projects. This frequently            why matters continue in this fashion and why
results in cost overruns, late delivery and lack       people continue to make the same mistakes?
of user acceptance for the overall project             Of course, the simple answer is that this is a
when it is finally delivered (if it ever is). In our   result of organisational staffing and budgetary
view, it is the failure to give due consideration      practices but that is not very helpful as an
to data migration that is the most common              explanation. In fact, we believe that the heart
cause of these problems.                               of the problem lies in the fact that the people
                                                       who manage the data in most IT departments
What is needed is a strong data management             are regarded as second-class citizens:
presence within the overall project team, so           applications are sexy, data is not. Moreover,
that data issues are given proper                      even within the data management group, data
consideration. Further, the data management            migration is about the least sexy thing you can
team members must understand the                       be doing.
techniques and strategies required for data
migration, along with the tools that can be            In other words, the root cause of data
used to assist with those processes. However,          migration problems derive from a perception
this raises issues of its own.                         of data management in general, and those
                                                       involved in data migration in particular. In our
The prevailing view of data migration                  view, this perception (and especially with
                                                       respect to data analysis—see later) needs a
Within most organisations there is no such             wind shift of 180o. Companies that continue to
thing as a data migration expert. Indeed, data         have the sort of view of data migration just
migration is regarded as a dead-end job that           described, will continue to have failed and
nobody wants. Why? Well, to begin with you             over-running projects.
get roasted if you fail but you don’t get any
Data Migration                                                                                                  page 4
The principles of data migration
There are many aspects of data migration                Along with the identification of data sources there
projects that are similar to those that are             is also an issue with respect to selecting the most
required for other types of project, such as            appropriate data source to use when data can be
project control itself, documentation and               found in more than one place. A common mistake
                                                        here is to assume that the “corporate” data source
reporting, testing, the building of data models         is the most accurate one. In practice, it may well be
and so forth. However, there are also                   that a salesman’s personal records have better
requirements that are much more specific,               contact details than the company’s CRM system, or
albeit that there is still overlap with other types     a maintenance engineer’s notes may be more com-
of migration and data movement projects. In             prehensive than the asset inventory.
this section we shall consider these specific
requirements in terms of the techniques that          The first step in identifying the scale of the data
are required and the strategy that might be           migration sub-project must therefore be to
adopted.                                              ensure that you know about all the data sources
                                                      that will feed into the new system.
There are seven broad techniques that need to         Unfortunately there is no tool to help you do
be used:                                              this; all you can do is to talk to your users. Note
                                                      that we don’t believe that talking to your
1.	 discovering and selecting your sources,           colleagues in the IT department will be
                                                      sufficient: they probably don’t even know that
2.	 understanding your data,                          these Access databases exist. In any case, as
                                                      John Morris states in his book “Practical Data
3.	 cleansing your data,                              Migration” (published by BCS) “data migration
                                                      is a business not a technical issue” and “the
4.	 transforming your data,                           business knows best” so you cannot expect to
                                                      run a successful data migration exercise
5.	 moving your data,                                 without talking to users. Moreover, it will be the
                                                      business users that sign-off the project at the
6.	 testing and validation,                           end of the day so keeping a close liaison
                                                      between the project team and the business
7.	 auditing and documentation.                       should be a major consideration at all times. It
                                                      is also worth noting that users can often be
The last three of these are fundamental to any        reluctant to sign-off on projects so keeping
successful project but they are, we believe, well     them fully in the loop is of paramount
understood (though testing, which is probably         importance.
the least sexy part of the non-sexy data
migration, can be problematic because you             These two quotes represent John’s first two
have nothing to compare your results with since       Golden Rules of data migration. Bear in mind,
you have, by definition, a new context for that       though, that the business won’t know all you
data). Moreover, they are not typically the cause     need to know: migration requires the business
of data migration problems. Thus this section         and IT to work in collaboration.
focuses on the first four of these techniques.
                                                      Once you know where all of your data sources
Source discovery                                      are, you can identify all of the data that you have
                                                      that can be used to populate the database for
There is no technique or methodology for              the new system. At this point you need to do a
source discovery per se. However, that does not       gap analysis: does the data that you have match
mean that it is not important. The point is this:     the requirements for data in the new system? In
you need to know about all the sources of data        particular, are there any mandatory fields in the
that will populate the new system. To take a          new application for which you do not currently
very simple example, suppose that you are             hold data? If the answers to these questions are
migrating from SAP to Oracle. It might appear         no then you have either missed a data source or
that the answer is straightforward: the               you genuinely do not record that information at
database(s) used to host SAP will source the          present, in which case you will have to decide
Oracle application. However, it may well be that      how to collect that data.
there was a separate HR system that will now
be incorporated within the new system. Okay,          On this note there is a further point: what about
you probably know about that. What you may            data that you don’t know that you know about?
not know about, and what even your IT                 For example, you may not think that you know
colleagues on the application side may not            someone’s postal or zip code but you can use
know about, is that the Finance department            enrichment tools to automatically discover this
(say) want to use the new system because it           information. Similarly, you can infer other sorts
expands on the capabilities that were previously      of ‘missing’ data by using profiling and other
available, which they had previously had to           tools. However, this is the subject (at least in
provide through Excel spreadsheets and local          part) of the next section.
Access databases.
    Data Migration                                                                                      page 5
    The principles of data migration
Finally, we should consider how these                 There is one other problem associated with
techniques are applied. While we have listed          data migration that we also need to mention
them in ‘chronological’ order we do not believe       and this is specifically with respect to the
that they should be implemented using a               budgeting of project teams. Because of the
conventional waterfall approach. The danger           agile and iterative nature of migration projects
then becomes that you never get to the end of         you will typically require short bursts of specific
the ‘understanding the data’ phase. Moreover, if      skill set resources at various times during the
you have a project plan that reads discovery:         project. What this means is that the percentage
months 1–4, movement: months 5–7, test and            utilisation of your resources is a major factor in
validate: months 8–12 you can easily end up not       cost control. For example, if the total project
knowing until month 9 that you have a project         requires 1,000 man days of resource but you
slippage.                                             can only utilise those resources at a rate of 75%
                                                      then your budget will need to reflect this fact
While you certainly need to do a detailed             unless those resources can be re-deployed.
profiling overview in order to set timescales         However, the danger is that upon re-
and budgets, during the actual process of the         deployment those resources will not then be
work a more flexible and iterative approach is        available for the migration project when they
required, whereby you profile each subset of          are needed, which will impinge on project
the data in turn, cleanse it, transform it and test   timescales. And, of course, if the company
it. For example, if you are doing a SAP               changes its priorities at any time during the
migration then you might start with the               course of the project then utilisation can easily
materials master, move that data, validate that       drop to 50% or lower, bringing about a
all of the data from the source has been              consequential cost increase.
properly moved, then move up the chain to the
next set of data that depends on that lower set,      What is needed is an approach that is
and so on. In other words, you need to take an        predicated upon pooled resources or very
‘agile’ programming approach in which of these        flexible contracting teams. In large enterprises,
techniques is used, in turn, in an iterative          where multiple migrations may be taking place
fashion. This way you may find that you have a        (our research indicates that the average Global
project slippage in week 6 but it is better to        200 company runs 4.48 migrations per annum)
know that earlier rather than later.                  it will make sense to run these migrations in
                                                      parallel so that resources can be shared on a
Note that while profiling and discovery is an         staggered timeline.
iterative process it generally needs to be
methodology driven. One approach is to
perform a ‘target led analysis’, focusing the
profiling on entities and attributes that are most
important to the target system (for example,
key entities, primary/foreign key fields,
dependencies and business constraints). There
is little point in performing analysis on
attributes that are not going to be migrated so
what you need is a cut and dice approach, along
with a well-defined methodology, particularly
for dealing with data quality problems as they
occur. Data migration teams should work with
the business analysts during the profiling
process to resolve data quality and business
data problems as they are discovered, rather
than leaving this until late in the project
lifecycle.
Data Migration                                                                                           page 8
Strategies for data migration
Whereas you need all of the techniques of data migration, the strategies you might apply to data
migration and similar projects represent options. There are essentially three of these: big bang,
parallel running and incremental, which we will discuss in turn.
Using the big bang strategy you continue             Parallel running starts the same way as the big
running on your original system(s) throughout        bang approach but you do not intend to cut over
the duration of the migration project. The new       to the new system once the data is loaded.
system is not up and running at any stage. At        Instead, you will run both systems in parallel,
more or less the last moment you load all of         with the original system(s) continuing to serve
your data onto your new system, turn off the old     the business while the new system is
one and then start running in the new                thoroughly tested until you are ready for a final
environment. The two big advantages of this          cutover. The advantage of this approach is that
approach are, first, that it eliminates the          the new system is more or less guaranteed to
problems of having dependencies between              work correctly when you finally move to it. In
systems where using non-big bang approaches          addition, you should have minimal downtime
some of the old systems have to run side-by-         between turning off the old system and turning
side with the new ones; and second, this             on the new one. The downside is that this
approach minimises your hardware and                 parallel running is expensive, not just in
systems requirement since the only time during       computing resources but also in terms of the
which you have two systems running is during         people required to run it, and the checking that
the changeover period. This, at least, is the        needs to go on to ensure that the new system is
theory, though it is often not realised in           doing what it is supposed to.
practice.
                                                     Parallel running is also more complex, as an
This changeover process typically takes a            environment, when compared to the big bang
weekend though we know of one                        approach. This is because incoming data is
telecommunications company that turned off its       entered into the old system and you also need
billing system for two weeks during such a           to propagate that data to the new system. This
changeover. Consider the cost of that to the         means that you need some sort of
business. No doubt the company originally            synchronisation software to keep the two
expected that it could do this over the              systems in parallel.
proverbial weekend but in the event it turned
out to be a much bigger bang than was
originally anticipated. However, even if the
weekend is all you need we are increasingly
living in a world where you cannot take part of
the business down at all. It may be fine for an
HR application, say, but for any on-line
applications the cost to the business is likely to
overwhelm any savings to be made by taking
this approach.
Incremental migration works in a similar               While not a first-line strategy such as those we
fashion to parallel running except that you            have just discussed, a secondary strategy is
incrementally turn on the new system rather            whether you wish to adopt an approach
than turning it on all at once. So, for example,       predicated on having zero downtime (or as near
you might be in a position where transactions          zero as possible—we know of one
for some customers are processed by the old            implementation where, after a nine month
system and transactions for others are                 migration project, there was 4 minutes of
processed by the new system. Similarly, some           downtime—but this was because the company
functions might be executed by the old system          was changing its Application Server) during the
and some functions by the new one, though this         migration. This decision will depend on the
will be as a result of the new application(s)          importance of the application and data you are
rather than data migration per se. This is one of      migrating. If this supports a web-based
the advantages claimed for incremental                 application or some other mission critical
migration: that you can start to a get a return        application that needs to be up 24x7 then a zero
on your investment much sooner.                        downtime approach should, properly, be
                                                       mandated; though this may not be the case if
Note that by the time that everything has been         you already use mirrored, redundant systems
moved it will be, more or less, time to turn off       where you can turn one of these off for the
the old system, so there will be very little, if any   duration of the migration and leave the other
full-scale parallel running.                           one(s) running. Zero-downtime migrations
                                                       should also be considered whenever a failure to
The traditional method of supporting                   go live, accurately, on time, may be detrimental
incremental migration is to place a flag against       to staff morale.
the relevant data fields in the source database,
which tells the application to look for that data      Note that a zero-downtime approach is more or
in the new system. The disadvantages of this           less implicit within the incremental and parallel
approach are that you have to amend all of the         strategies but is inimical to the big bang.
applications that address this data and you have
to amend the data tables (and the database
schema) in order to enable this, and
performance will deteriorate when the data is
in the new system. An alternative approach is to
use a state model that keeps a record of the
current state of the data migration (that is, it
knows which data is live in which system). The
application is then directed initially at the state
model, which tells the application where the
relevant data is located. Note that there is no
requirement to amend any data sources when
using this method and for this reason alone the
state model approach is to be preferred over
the flag-based method. However, this is not the
only reason why it is better.
1.	 The data migration aspect of the project         In our view, data migration has historically been
    should be set up with its own management         under-valued, under-resourced and not treated
    team and its own project planning. It should     with the attention it deserves. As a result, many
    be treated as an independent sub-project         major migration projects have failed either
    within the overall project, with its own         partially or completely. The remedy for these
    budget and resources.                            failures, we believe, is to accord data migration
                                                     the respect it deserves: to consider it in detail
2.	 The data migration project team needs to         prior to the initiation of relevant projects and
    identify all the relevant data sources and       not as an afterthought. In turn, this will result in
    undertake initial profiling and cross-system     increased status for data migration as a
    data analysis, so that the team can estimate     discipline, which will, as a result, mean that
    the total costs, resources and time required     more experienced and capable staff are
    for the sub-project BEFORE budget and            attracted to this function. In a virtuous circle,
    delivery schedules for the whole project are     this will then improve the practice of data
    finalised.                                       migration, which will feedback into improved
                                                     costing and delivery for broader projects that
3.	 As a matter of principle, staff should not be    involve data migration.
    re-allocated back from the data migration
    team to fire fight other issues within the       Further information
    project or elsewhere within IT, unless the
    data migration sub-project is ahead of           Bloor Research will keep a page dedicated to
    schedule, and then only to the extent that is    this white paper on their website for further
    not put behind schedule.                         information on this topic as it becomes
                                                     available. The page can be accessed at:
4.	 We recommend the use of appropriate tools,       http://www.bloor-research.com/research/
    both in order to understand the data to be       white_paper/875/data_migration.html
    migrated and its relationships, as well as for
    transformation and moving the data. In the
    case of profiling, data relationship discovery
    and analysis tools, the environment
    (especially for any large scale migrations) is
    likely to be too complex to achieve this
    manually. In the case of transformation and
    movement tools we recommend their use
    primarily because they should make it much
    easier to back out of failed processes. In
    addition, the fact that an audit trail and
    documentation should be automatically
    provided will be an advantage. We also
    recommend the use of data cleansing tools,
    not just for their own sake but also because
    they will allow you to measure the quality of
    data (and they may also be useful for certain
    types of complex transformations). In
    addition, if the state model approach for
    incremental migration is adopted, then an
    appropriate tool will be required. The same
    is true if you are adopting an approach that
    employs data synchronisation in a parallel
    processing environment.