Content processing for site migration

Learn how to use the Migration API to massage and migrate source data into Drupal. Use cases include site upgrades, migrating from other platforms, converting content to other data types and continuously updating content from a remote source.

17 December 2019

 

Introduction

Hi everyone, Welcome to DrupalSouth and thanks for coming to my sessions. Today I am going to share my experience with content migration.

Let's start with some introductions of myself. I'm a Drupal developer at Morpht, doing back-end and front-end developer. Recently I have been focusing on web accessibility and content migration.

Migrate API in Drupal.It is a three-step process. Extract, transform and load. Extract is the process we get the data from our source. The raw data we're getting may not be in the right formats that works in Drupal, so that’s why we need the process plug-ins to transform the data into something that can be used in Drupal.
 

What can we do with migrations?

Once off migrations. Examples:

  • Upgrade a site from Drupal 7 to 8.
  • Rebuild a non-Drupal into Drupal and transfer the contents across.
  • Bulk content update, like populating the data in a new field or transferring data from one content type to another.

Continues migrations. Examples:

  • Getting weather information from the forecast on a regular basis.
  • Getting the stock price from the share market.
  • Distributing content into multiple satellite sites from a centralized content repository.
  • Migration source
  • Migrate API supports a wide range of data sources. CSV, XML, JSON, JSON API etc. We can also make direct SQL queries to the database if we have access to it.

Sometimes exporting data from the source is not possible, or the data structure is too complicated, or it is too hard to handle, if there is the case, we can do content scraping. There are a number of libraries that allow us to scrape content from a website. 

What's good about content scraping? “What you see is what you get”. An image is an image, instead of getting some data in a token format that we will need further process to retrieve the data. It also works for Drupal or non Drupal sites.

The downside is however, the content we are getting is not well structured, and it is hard to deal with things like entity reference.
 

Migration examples

Let's say our client wants us to migrate a simple blog site like the one below.

Morpht Blog

 

What are we dealing with?

From the front page, we can see there's a title, a publication date, a summary field in each article.

Inside the article, there are tags, HTML body, internal links, and PDF downloads, etc.

Editor experience

There are also other hidden elements, like the publishing status, menu, URL aliases, 301 redirects, if we want to transfer the traffic from the old path to a new one.

 

What are the challenges?

Let's look at the date, “26 November 2019” is the date format we're getting from the source, but in order to import into a date field in Drupal. “2019-11-26” is the format we need.

For taxonomy. A list of links or a list of tags could be what we are getting from the source, but in the Drupal world, we need to first create the terms and handle the field with IDs.

The same things for images, how do we download the image? Where to save it? Likewise we will need to create a file entity and deal with the file ID in migration.

 

What tools are available?

There are a number of plugins in migration and below are some handy ones.

 

Strings and arrays

 

Concat

It is a good choice to combine number of fields into a single string. In this example we are converting an address field with many sub fields into one single address field.

Migrate
process:
  field_address:
    plugin: concat
    source:
      - stree
      - suburb
      - state
      - postcode
    delimiter: ,
Source
{
  "stree": "1 Davey St",
  "suburb": "Hobart",
  "state": "TAS",
  "postcode": "7000"
}
Output
"1 Davey St, Hobart, TAS, 7000";

 

Explode

The explode plugin allows us to break up a single string into an array.

Migrate
process:
  field_address:
    plugin: explode
    source: address
    delimiter: ,
Source
{
  "address": "1 Davey St, Hobart, TAS"
}
Output
["1 Davey St",  "Hobart",  "TAS",  "7000"]

 

Substr

Substring plugin allows us to get a segment of a string to the output. In this example we are getting only the state field from the string of an address.

Migrate
process:
  field_state:
    plugin: substr
    source: address
    start: -3
    length: 3
Source
{
  "address": "1 Davey St, Hobart, TAS"
}
Output
"TAS"

These plugins are all familiars because they are like the PHP functions.

 

Default value

In some cases, When the data is not available in the source, we can use the default value. In this example, we are setting the UID to 12 for all the articles.

Migrate
process:
  title: title
  body: body
  uid:
    plugin: default_value
    default_value: 12
Source
{
  "title": "Article 1",
  "body": "This is a good article."
}
Output

Node author = User ID 12
 

Static map

Static map allows us to create a one-to-one mapping from one set of data to another set of data. In Drupal 7, user roles are stored in UID and we can do a mapping on the role machine names for that to work in Drupal 8.

Migrate
process:
  roles:
    plugin: static_map
    source: rids
    map:
      3: administrator
      4: moderator
      5: editorial_board
      6: site_architect
Source
{
  "rids": [3, 5, 6]
}
Output
User roles:
  Administrator
  Editorial board
  Site architect

 

Format date

The date formats we are getting from source may not be matching the format that is used in data import. We can use the format date plug-in to transform date formats.

Migrate
process:
  field_date:
    plugin: format_date
    from_format: 'j F Y'
    to_format: 'Y-m-d'
    source: date
Source
{
  "date": "26 November 2019"
}
Output
2019-11-29

 

Entity reference

Entity Reference can be simple if we have the target ID in souce, which is possible if we are doing Drupal to Drupal migrations.

Source
{
  "tids": [14, 15, 17, 32]
}
Migrate
process:
  field_tags: tids

 

Entity lookup

In most cases, we are getting a list of names in the source and we can use Entity Lookup for it. It takes the entity name and looks it up to work out the ID of the entity.

Source
{
  "tags": ["tag 1", "tag 2", "tag 3"]
}
Migrate
process:
  field_tags:
    plugin: entity_lookup
    source: tags
    value_key: name
    entity_type: taxonomy_term
    bundle: tags

 

Entity generate

If there are new items. The “Entity generate” plugin can create new terms on the fly when migrating the term field.

Notes from Quietone: Entity generate plugin creates entities that are not in our migration map. The entities created cannot be rolled back, or be found in lookups. This plugin is useful but it is not following the rule of ETL process.

Source
{
  "tags": [ "A new tag", "tag 1"]
}
Migrate
process:
  field_tags:
    plugin: entity_generate
    source: tags
    value_key: name
    entity_type: taxonomy_term
    bundle: tags

 

Migration lookup

Migration Lookup is another awesome plugin, it allows us to reference an entity that has been created from another migration.

In this example we have a list of users which were originally imported. Then, when we import the articles, we are able to reference a user that was created in the first migration.

Migrate
process:
  uid:
    plugin: migration_lookup
    migration: users
    source: author
Source (users)
{
  "id": 1,
  "first name": "Peter",
  "last name": "Smith",
}
Source (articles)
{
  "title": "Article 1",
  "body": "This is a good article...",
  "author": 1,
}

 

Files

 

Download

The Files Download plugin allows us to grab a file from remote and save it into a destination. Then we can use the migration lookup to populate the file ID into a file field.

Migrate
process:
  filename: filename
  filemime: filemime
  status:
    plugin: default_value
    default_value: 1
  uri:
    plugin: download
    source:
      - file_source
      - file_destination
Source
{
  "id": "1",
  "filename": "file1.pdf",
  "filemime": "application/pdf",
  "file_source":
    "http://example.com/file1.pdf",
  "file_destination":
    "public://documents/file1.pdf"
}

 

File import

The file import plugin provides a much simpler way to handle file migrations. It combines the creation of the file entity as well as the migration locked up in one go.

We can see the source required for the import is much simpler, and all we need is the URL of the file. 

Migrate
process:
  field_file:
    plugin: file_import
    source: file
Source
{
  "file": "http://example.com/file1.pdf"
}

 

Image import

It’s the same thing for images. We can use the Image Import plugin to import images.

Migrate
process:
  field_image:
    plugin: image_import
    source: image
    destination: 
      plugin: default_value
      default_value:
        "public://images/"
    title: image_title
    alt: !title
Source
{
  "image":
    "https://example.com/logo.png",
  "image_title": "Logo"
}

 

URL

 

URL redirect

If we need to handle URL redirect or alias. Redirects are entities in Drupal 8. So we can do a simple migration with the destination plugin set to entity:redirect.

Migrate
process:
  redirect_source: old_url
  redirect_redirect: new_url
  status_code:
    plugin: default_value
    default_value: 301
  ...
  destination:
    plugin: 'entity:redirect'
Source
{
  "old_url": "old-url",
  "new_url": "internal:/node/54"
}
Output

301 redirect from old-url to /node/54

 

URL alias

The same for aliases. We have a destination plugin to handle URL aliases, that will give our content a  SEO friendly URL.

Migrate
process:
  source: source
  alias: alias
  langcode:
    plugin: default_value
    default_value: 'en'
destination:
  plugin: url_alias
Source
{
  "source": "/node/5",
  "alias": "/article/good-article"
}
Output

Set alias /article/good-article to node/5

 

More

 

Callback

Callback plugins allow us to use PHP functions to process our data. In this example, we are using a function from Drupal core to convert line breaks into P tags.

Migrate
process:
  body/value:
    plugin: callback
    callable: _filter_autop
    source: body
Source
{
  "body":
    "Lorem ipsum dolor sit amet, 
     consectetur adipiscing elit. 

     Curabitur aliquet quam id dui 
     posuere blandit ...
    "
}
Output

Converts line breaks into <p> and <br>

That is the same function we are calling in the input filter.

Enabled filters

Pipeline

Pipeline itself is not a plugin but it allows us to run a number of plugins sequentially. In this example, after we have done the input text filter, we are running a string replace plugin to fix typos.

Migrate
process:
  body/value:
    -
      plugin: callback
      callable: _filter_autop
      source: body
    -
      plugin: str_replace
      search: 
        ["typo 1", "typo 2", ...]
      replace: 
        [
          "correction 1",
          "correction 2",
          ...
Source
{
  "body":
    "Lorem ipsum dolor sit amet, 
     consectetur adipiscing elit. 

     Curabitur aliquet quam id dui 
     posuere blandit ...
    "
}
Output

Body text with strings replaced.

 

Custom plugins

If none of the plugins available can handle our data. We can also create our own custom plugin.

class TransformValue extends ProcessPluginBase {
  /**
   * {@inheritdoc}
   */
  public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
    return strrev($value);
  }
}

Reference: https://www.drupal.org/docs/8/api/migrate-api/migrate-process/writing-a…

 

List of core Migrate process plugins

There are  a lot of process plugins available. I have just covered a few of those as highlighted.

  • array_build
  • callback
  • concat
  • default_value
  • download
  • entity_exists
  • explode
  • extract
  • file_copy
  • flatten
  • format_date
  • get
  • log
  • machine_name
  • menu_link_parent
  • migration_lookup
  • MakeUniqueBase
  • make_unique_entity_field
  • null_coalesce
  • route
  • skip_on_empty
  • skip_row_if_not_set
  • static_map
  • substr
  • sub_process
  • url_encode

 

Process plugins by Migrate Plus

And there are more from contrib modules.

  • array_pop
  • array_shift
  • dom
  • dom_apply_styles
  • dom_str_replace
  • dom_migration_lookup
  • entity_lookup
  • entity_generate
  • file_blob
  • file_blob
  • merge
  • multiple_values
  • single_value
  • skip_on_value
  • str_replace
  • transliteration

 

Useful links

Migrate process overview
https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/m…

List of core Migrate process plugins
https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/l… 

List of process plugins provided by Migrate Plus
https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/l… 

Writing a process plugin
https://www.drupal.org/docs/8/api/migrate-api/migrate-process/writing-a… 

 

Download slides:

 

More like this