Azure data factory – MYSQL to Azure SQL

The following steps describe how to move data from on-premise MYSQL server to MSSQL on Azure.
Every data factory job has 4 key components –
Gateway, Linked services, Source and Pipeline.
Gateway here is what provides access to your MYSQL server. Usually, when setting up data factory on Azure portal, you will get a link to download and install gateway on the server. I did this the other way around. Went to this link to install the gateway
then copied the key fro the portal for the data factory and in few seconds i could see on the portal that data factory shows 1 gateway online.
I have one VM with a static IP that can connect to all the test DBs and gateway installed. I just keep changing the key and keep connecting to different data factory.
Next are your linked services. These are like your connection strings to the various servers. I have below added the source and destination examples for MYSQL source and MSSQL azure target. But you can always change them. You get more details on how to change JSON based on source and target here :

{ “name”: “SOURCE”, “properties”: { “description”: “”, “server”: “”, “database”: “DBName”, “schema”: “”, “authenticationType”: “Basic”, “username”: “user_id”, “password”: “**********”, “gatewayName”: “GatewayName-CheckPortal”, “encryptedCredential”: null “type”: “OnPremisesMySqlLinkedService” } } { “name”: “TARGET”, “properties”: { “description”: “”, “connectionString”: “Data,1433;InitialCatalog=DBName;User ID=User_Id;Password=**********;Encrypt=True;TrustServerCertificate=False;Application Name=\”Azure Data Factory Linked Service\””, “type”: “AzureSqlLinkedService” } }


Next are your datasets. Here you are defining the location of data within your source and target. the main property here is ‘Location’ which gives details on dataset type and name, and then is tied back to the service. So here your are defining your source of data within the servers defined above. Again visit this link in case you want to change your endpoints :

{ “name”: “Source”, “properties”: { “published”: false, “location”: { “type”: “RelationalTableLocation”, “tableName”: “tableName”, “linkedServiceName”: “SOURCE” }, “availability”: { “frequency”: “Hour”, “interval”: 1, “waitOnExternal”: {} } } } { “name”: “Target”, “properties”: { “published”: false, “location”: { “type”: “AzureSqlTableLocation”, “tableName”: “TableName”, “linkedServiceName”: “TARGET” }, “availability”: { “frequency”: “Hour”, “interval”: 1 } } }

What connects this in the end is the pipeline. The type here is ‘CopyActivity’ since we are copying data, source query has the data we want to move andtarget has procedure name we are calling to move data to azure SQL. You can keep this simple by just giving the table name on azure SQL. Example :

{ “name”: “Pipeline_MySQL_To_AzureSQL”, “properties”: { “activities”: [ { “type”: “CopyActivity”, “transformation”: { “source”: { “type”: “RelationalSource”, “query”: “select * from tableName limit 1000;” }, “sink”: { “type”: “SqlSink”, “sqlWriterStoredProcedureName”: “spOverwriteSomeName”, “sqlWriterTableType”: “SomeTableType”, “writeBatchSize”: 0, “writeBatchTimeout”: “00:00:00″ } }, “inputs”: [ { “name”: “Source” } ], “outputs”: [ { “name”: “Target” } ], “policy”: { “timeout”: “01:00:00″, “concurrency”: 1, “executionPriorityOrder”: “NewestFirst”, “retry”: 2 }, “name”: “MySQLToBlobCopyActivity” } ], “start”: “2015-07-12T13:00:00Z”, “end”: “2015-07-12T16:00:00Z”, “isPaused”: false } }


A few notes from my experience :

Make sure to add waitforexternal in your linked service, else you will see pending execution or pending validation on the portal. If you don’t know what to add there, just keep it blank and the defaults will be picked up.
Make sure the gateway has mysql connector 6.6.5 installed if you are trying to connect to mysql server. See this and then get this
i also had the target tables ready with a proper clustered index so that no error is thrown in case the source didn’t have a clustered index.