Failing safe with the Apex Data Loader for Salesforce



If you have a way to pull and format data from another system, the Apex Data Loader is a great, on-demand way to push data into Salesforce CRM. Without much effort, you can automate the process and create a hands-free data sync. The tricky part is that while the Data Loader creates excellent logs, there’s no obvious way to alert a person when a data sync fails. Once a system is in place, failures are rare, but, in real life, stuff happens, and it’s helpful if failsafes are in place.


A key problem is that the Apex Data Loader writes an error log whether there are errors or not. If the pass is 100% successful, the error log will have a header line and no rows.


If the error log was only written when errors occurred, we could just look for the error logs, and process the files whenever they crop up. But, in this case, we first need to look inside the log and count the lines before we can be sure there was a failure.


On a Windows-based system, one solution is to use a VB script, like COUNTLN.VBS, to count the number of lines in a file.




‘ COUNTLN.VBS
‘ Description: Count the number of lines in a file and return as error level (up to 255)

‘ Usage: cscript countln.vbs errors.csv


Const ForReading = 1

Set objFSO = CreateObject(“Scripting.FileSystemObject”)

Set objTextFile = objFSO.OpenTextFile (WScript.Arguments.Item(0), ForReading)

objTextFile.ReadAll

‘ Wscript.Echo “Number of lines: “ & objTextFile.Line

WScript.Quit(objTextFile.Line)



The COUNTLN.VBS script sets the exit condition, which can then be tested in a DOS Batch file, like CHECKLOG.BAT.


@ECHO OFF

REM CHECKLOG.BAT

REM Count the lines and send the errors, if any

ECHO CountLn: %1

CSCRIPT ..\countln.vbs %1

IF ERRORLEVEL 1 ECHO Line Count: %errorlevel%

IF ERRORLEVEL 3 ..\7z.exe a %2 %1

ERASE %1

ECHO Checklogs process complete.


The Apex Data Loader takes as input a common-separated-value (CSV) or a tab-delimited file. The first row of the input file is a header detailing the input columns, with each subsequent row representing a record to be sent to the target Salesforce CRM object.


When the Apex Data Loader runs, it creates an success(ID).log and an error(ID).log. The (ID) is replaced with a timestamp shared by the companion success and error logs. Both logs contain a header row, and a row for each input row, detailing the data processed for each column. The success log contains all the rows that made it into Salesforce. The error logs contains all the rows that did not cross over.


Conveniently, the logs can also be used as input files. If the errors can be easily fixed, you can resubmit the error log to the Apex Data Loader, without going back to the external system.


The CHECKLOG.BAT file is designed to add the matching error files to a compressed archive file in the widely-supported ZIP format. Here we’re using the open source 7z utility to create the archive, but others would work as well. In this way, if CHECKLOG is is not run as often as the Apex Data Loader itself, the process can support a single file or multiple files. And, if the data is sensitive, most ZIP utilities support encrypting the archive.


Given a secure ZIP archive, email can be a great way to get the error logs to an administrator who can address the problems. Staying with our Windows platform theme, here’s another VB script that can send an email alert, with an attachment, to a Gmail account.




‘Usage: cscript sendmail.vbs <email_recipient@example.com> “” “” “”
‘Ex. No attach: cscript sendmail.vbs example@gmail.com “test subject line” “test email body”

‘Ex. W/ attach: cscript sendmail.vbs example@gmail.com “test subject line” “test email body” “c:\scripts\log.txt”

‘Attachment path cannot be relative


CONFIGURE THE FROM EMAIL ADDRESS AND PASSWORD

Const fromEmail = “(TBD)@gmail.com”

Const password = “(TBD)”

END OF CONFIGURATION

Dim emailObj, emailConfig

Set emailObj = CreateObject(“CDO.Message”)

emailObj.From = fromEmail

emailObj.To = WScript.Arguments.Item(0)

emailObj.Subject = WScript.Arguments.Item(1)

emailObj.TextBody = WScript.Arguments.Item(2)

If WScript.Arguments.Count > 3 Then

emailObj.AddAttachment WScript.Arguments.Item(3)

End If

Set emailConfig = emailObj.Configuration








emailConfig.Fields.Update

emailObj.Send

Set emailobj = nothing

Set emailConfig = nothing




Of course, the notion is that you create a special Gmail account just to receive the alerts and then forward to an administrator from there.


While we have the pieces in place for handling the error logs, there is also the problem of what to do with the success logs. The Apex Data Loader kindly generates both logs, that between them, contain all of the detail found in the original input file, for each and every pass.


While this level of detail is great (really-really-great), if you are running a sync every five minutes, retrieving a log later can be a real “cant-see-the-forest–for-the-trees” headache.


Long story short, in addition to error alerts, an automated Apex Data Loader sync also needs help with general log management. The ARCHIVE.BAT file is designed to be run at least once a day, but could also be run every hour. It sweeps all of the success logs into an archive for the current date, and, as discussed, emails the error logs to a Gmail account.


To deploy the system, beneath the current logging folder (usually “Status”), create an Archive folder, and beneath that create Success and Error folders. Put the ARCHIVE.BAT and VBS files under the Archive folder (along with the 7z utility), and the CHECKLOG.BAT under the Error folder.


\Status

.\Archive

- archive.bat

- countln.vbs

- sendmail.vbs

- 7z.exe

- 7z.dll

- 7z-zip.dll

.\Archive\Error

- checklog.bat

.\Archive\Success

- checklog.bat


A call to ARCHIVE.BAT can be queued as a Windows Scheduled Task to run every hour, or once a day, depending on how often the Apex Data Loader runs. The system moves the logs out of .\Status and creates a ZIP for each day under .\Status\Archive in the format “LOGS-YYMMDD.ZIP” (while emailing the error logs to the Gmail account). Temporary copies of the Success and Error logs are also put into the Success and Error folders, for processing by CHECKLOG.BAT.




@ECHO OFF
REM ARCHIVE.BAT

REM Email non-empty Data Loader error logs to a GMail account and

REM archive to a zip file all other log files (.csv).

REM –

REM Main file to call from a scheduled task.

REM Intended to be launched from a subdirectory directly beneath the log folder.

REM Expects an error\ subdirectory to exist (status\archive\error).

REM –

REM Temporarily stores error logs being emailed in errorlog.zip and

REM calls archivelogs.bat to store logs by date in LOGS-20??-??-??.ZIP

REM Uses as dependencies: checklog.bat, countln.vbs, sendmail.vbs, 7z.

REM Checklog.bat is stored in subdirectories .\error and .\success.

REM All others in status\archive next to this file.

REM –

REM Bring logs down to a working folder, to avoid gaps in process

REM –

IF EXIST ..\.csv MOVE ..*.csv .

IF NOT EXIST .csv GOTO Done

REM –

REM Buffer errors and archive latest logs (\error must exist)

REM –

IF EXIST error.csv COPY error.csv error\

IF EXIST success.csv COPY success.csv success\

SET TDate=%date:~10,4%%date:~4,2%%date:~7,2%

.\7z.exe a LOGS-%TDate%.zip .csv

IF ERRORLEVEL 1 GOTO Done

IF NOT EXIST LOGS-%TDate%.zip GOTO Done

ERASE .csv

REM –

REM Cleanup errorlog.zip before calling errorlogs.bat

SET ZIP=(TBD: Full path to a local working directory)\errorlog.zip

IF EXIST %ZIP% ERASE %ZIP%

REM –

REM Call checklog for each error log.

REM –

CD error

FOR %%x IN ( error.csv ) DO CMD.EXE /C checklog.bat %%x %%ZIP%%

ECHO Emailing: %ZIP%

REM Attachment path cannot be relative

CSCRIPT ..\sendmail.vbs ccsa.alert@gmail.com “Alert! Data Load Failed.” “The Data Loader was unable to push records to Salesforce.” “%ZIP%”

CD ..

REM –

REM Cleanup successlog.zip before calling checklogs.bat

SET ZIP=(TBD: Full path to a local working directory)\successlog.zip

IF EXIST %ZIP% ERASE %ZIP%

REM –

REM Call checklog for each success log.

REM –

CD success

FOR %%x IN (success*.csv ) DO CMD.EXE /C checklog.bat %%x %%ZIP%%

ECHO Emailing: %ZIP%

REM Attachment path (%ZIP%) cannot be relative

CSCRIPT ..\sendmail.vbs (TBD@TBD.COM) “Alert! Data Load Succeeded.” “The Data Loader was able to push records to Salesforce.” “%ZIP%”

CD ..

:Done

ECHO Archive process complete.



The second CHECKLOG call to the success folder is optional. When left in place, the system will email the success and error logs separately, in case the administrator wants assurance that the system is running successfully on schedule.


Another option would be to check for certain types of fatal errors. One fatal case would be someone renaming a target field in Salesforce without updating the Apex Data Loader configuration. If the mapping file refers to a field that no longer exists, the Data Loader generates a zero-length success log (with not even a header row). The CHECKFATAL.BAT alternative only ZIPs for emailing any success logs that are totally empty.


@ECHO OFF

REM CHECKFATAL.BAT

REM Count the lines and send only fatal errors, if any. Only use in Success folder.

ECHO CountLn: %1

CSCRIPT ..\countln.vbs %1

IF ERRORLEVEL 2 ERASE %1

IF EXIST %1 ..\7z.exe a %2 %1

ECHO Check fatal process complete.


The final touch is to prune the logs every so often. The PRUNELOGS.BAT batch file can be run once a month, on the first of the month. Based on the date represented by the file name, PRUNELOGS.BAT retains 2-3 months of logs, based on the current month, and deletes the rest.




@ECHO OFF
REM PRUNELOGS.BAT

REM ZIP delete the archive for month before last (retain 2-3 months of logs)

REM Intended to be called on the 1st day of each month

SET MM %date:~-10,2

IF “%MM%“ == “01” ERASE LOGS-20??-11-??.ZIP

IF “%MM%“ == “02” ERASE LOGS-20??-12-??.ZIP

IF “%MM%“ == “03” ERASE LOGS-20??-01-??.ZIP

IF “%MM%“ == “04” ERASE LOGS-20??-02-??.ZIP

IF “%MM%“ == “05” ERASE LOGS-20??-03-??.ZIP

IF “%MM%“ == “06” ERASE LOGS-20??-04-??.ZIP

IF “%MM%“ == “07” ERASE LOGS-20??-05-??.ZIP

IF “%MM%“ == “08” ERASE LOGS-20??-06-??.ZIP

IF “%MM%“ == “09” ERASE LOGS-20??-07-??.ZIP

IF “%MM%“ == “10” ERASE LOGS-20??-08-??.ZIP

IF “%MM%“ == “11” ERASE LOGS-20??-09-??.ZIP

IF “%MM%“ == “12” ERASE LOGS-20??-10-??.ZIP



With these patches in place, the Apex Data Loader can be a safe and reliable way to automatically import data files generated by another system. Though, no data integration system is foolproof, and you should still devise a custom validation report that runs every day to confirm that data is flowing into the system as expected.




While it would be wonderful if we could originate and retain all of our enterprise data only in Salesforce CRM, in real life, we still need to integrate with other systems, and, when we do, tools like the Apex Data Loader, with a pinch of salt, can help us get the job done.

For more Data Loader tips, drop by the blog Using the Apex Data Loader with Salesforce.