Failing safe with the Apex Data Loader for Salesforce



If you have a way to pull and format data from another system, the Apex Data Loader is a great, on-demand way to push data into Salesforce CRM. Without much effort, you can automate the process and create a hands-free data sync. The tricky part is that while the Data Loader creates excellent logs, there’s no obvious way to alert a person when a data sync fails. Once a system is in place, failures are rare, but, in real life, stuff happens, and it’s helpful if failsafes are in place.


A key problem is that the Apex Data Loader writes an error log whether there are errors or not. If the pass is 100% successful, the error log will have a header line and no rows.


If the error log was only written when errors occurred, we could just look for the error logs, and process the files whenever they crop up. But, in this case, we first need to look inside the log and count the lines before we can be sure there was a failure.


On a Windows-based system, one solution is to use a VB script, like COUNTLN.VBS, to count the number of lines in a file.




‘ COUNTLN.VBS
‘ Description: Count the number of lines in a file and return as error level (up to 255)

‘ Usage: cscript countln.vbs errors.csv


Const ForReading = 1

Set objFSO = CreateObject(“Scripting.FileSystemObject”)

Set objTextFile = objFSO.OpenTextFile (WScript.Arguments.Item(0), ForReading)

objTextFile.ReadAll

‘ Wscript.Echo “Number of lines: “ & objTextFile.Line

WScript.Quit(objTextFile.Line)



The COUNTLN.VBS script sets the exit condition, which can then be tested in a DOS Batch file, like CHECKLOG.BAT.


@ECHO OFF

REM CHECKLOG.BAT

REM Count the lines and send the errors, if any

ECHO CountLn: %1

CSCRIPT ..\countln.vbs %1

IF ERRORLEVEL 1 ECHO Line Count: %errorlevel%

IF ERRORLEVEL 3 ..\7z.exe a %2 %1

ERASE %1

ECHO Checklogs process complete.


The Apex Data Loader takes as input a common-separated-value (CSV) or a tab-delimited file. The first row of the input file is a header detailing the input columns, with each subsequent row representing a record to be sent to the target Salesforce CRM object.


When the Apex Data Loader runs, it creates an success(ID).log and an error(ID).log. The (ID) is replaced with a timestamp shared by the companion success and error logs. Both logs contain a header row, and a row for each input row, detailing the data processed for each column. The success log contains all the rows that made it into Salesforce. The error logs contains all the rows that did not cross over.


Conveniently, the logs can also be used as input files. If the errors can be easily fixed, you can resubmit the error log to the Apex Data Loader, without going back to the external system.


The CHECKLOG.BAT file is designed to add the matching error files to a compressed archive file in the widely-supported ZIP format. Here we’re using the open source 7z utility to create the archive, but others would work as well. In this way, if CHECKLOG is is not run as often as the Apex Data Loader itself, the process can support a single file or multiple files. And, if the data is sensitive, most ZIP utilities support encrypting the archive.


Given a secure ZIP archive, email can be a great way to get the error logs to an administrator who can address the problems. Staying with our Windows platform theme, here’s another VB script that can send an email alert, with an attachment, to a Gmail account.




‘Usage: cscript sendmail.vbs <email_recipient@example.com> “” “” “”
‘Ex. No attach: cscript sendmail.vbs example@gmail.com “test subject line” “test email body”

‘Ex. W/ attach: cscript sendmail.vbs example@gmail.com “test subject line” “test email body” “c:\scripts\log.txt”

‘Attachment path cannot be relative


CONFIGURE THE FROM EMAIL ADDRESS AND PASSWORD

Const fromEmail = “(TBD)@gmail.com”

Const password = “(TBD)”

END OF CONFIGURATION

Dim emailObj, emailConfig

Set emailObj = CreateObject(“CDO.Message”)

emailObj.From = fromEmail

emailObj.To = WScript.Arguments.Item(0)

emailObj.Subject = WScript.Arguments.Item(1)

emailObj.TextBody = WScript.Arguments.Item(2)

If WScript.Arguments.Count > 3 Then

emailObj.AddAttachment WScript.Arguments.Item(3)

End If

Set emailConfig = emailObj.Configuration








emailConfig.Fields.Update

emailObj.Send

Set emailobj = nothing

Set emailConfig = nothing




Of course, the notion is that you create a special Gmail account just to receive the alerts and then forward to an administrator from there.


While we have the pieces in place for handling the error logs, there is also the problem of what to do with the success logs. The Apex Data Loader kindly generates both logs, that between them, contain all of the detail found in the original input file, for each and every pass.


While this level of detail is great (really-really-great), if you are running a sync every five minutes, retrieving a log later can be a real “cant-see-the-forest–for-the-trees” headache.


Long story short, in addition to error alerts, an automated Apex Data Loader sync also needs help with general log management. The ARCHIVE.BAT file is designed to be run at least once a day, but could also be run every hour. It sweeps all of the success logs into an archive for the current date, and, as discussed, emails the error logs to a Gmail account.


To deploy the system, beneath the current logging folder (usually “Status”), create an Archive folder, and beneath that create Success and Error folders. Put the ARCHIVE.BAT and VBS files under the Archive folder (along with the 7z utility), and the CHECKLOG.BAT under the Error folder.


\Status

.\Archive

- archive.bat

- countln.vbs

- sendmail.vbs

- 7z.exe

- 7z.dll

- 7z-zip.dll

.\Archive\Error

- checklog.bat

.\Archive\Success

- checklog.bat


A call to ARCHIVE.BAT can be queued as a Windows Scheduled Task to run every hour, or once a day, depending on how often the Apex Data Loader runs. The system moves the logs out of .\Status and creates a ZIP for each day under .\Status\Archive in the format “LOGS-YYMMDD.ZIP” (while emailing the error logs to the Gmail account). Temporary copies of the Success and Error logs are also put into the Success and Error folders, for processing by CHECKLOG.BAT.




@ECHO OFF
REM ARCHIVE.BAT

REM Email non-empty Data Loader error logs to a GMail account and

REM archive to a zip file all other log files (.csv).

REM –

REM Main file to call from a scheduled task.

REM Intended to be launched from a subdirectory directly beneath the log folder.

REM Expects an error\ subdirectory to exist (status\archive\error).

REM –

REM Temporarily stores error logs being emailed in errorlog.zip and

REM calls archivelogs.bat to store logs by date in LOGS-20??-??-??.ZIP

REM Uses as dependencies: checklog.bat, countln.vbs, sendmail.vbs, 7z.

REM Checklog.bat is stored in subdirectories .\error and .\success.

REM All others in status\archive next to this file.

REM –

REM Bring logs down to a working folder, to avoid gaps in process

REM –

IF EXIST ..\.csv MOVE ..*.csv .

IF NOT EXIST .csv GOTO Done

REM –

REM Buffer errors and archive latest logs (\error must exist)

REM –

IF EXIST error.csv COPY error.csv error\

IF EXIST success.csv COPY success.csv success\

SET TDate=%date:~10,4%%date:~4,2%%date:~7,2%

.\7z.exe a LOGS-%TDate%.zip .csv

IF ERRORLEVEL 1 GOTO Done

IF NOT EXIST LOGS-%TDate%.zip GOTO Done

ERASE .csv

REM –

REM Cleanup errorlog.zip before calling errorlogs.bat

SET ZIP=(TBD: Full path to a local working directory)\errorlog.zip

IF EXIST %ZIP% ERASE %ZIP%

REM –

REM Call checklog for each error log.

REM –

CD error

FOR %%x IN ( error.csv ) DO CMD.EXE /C checklog.bat %%x %%ZIP%%

ECHO Emailing: %ZIP%

REM Attachment path cannot be relative

CSCRIPT ..\sendmail.vbs ccsa.alert@gmail.com “Alert! Data Load Failed.” “The Data Loader was unable to push records to Salesforce.” “%ZIP%”

CD ..

REM –

REM Cleanup successlog.zip before calling checklogs.bat

SET ZIP=(TBD: Full path to a local working directory)\successlog.zip

IF EXIST %ZIP% ERASE %ZIP%

REM –

REM Call checklog for each success log.

REM –

CD success

FOR %%x IN (success*.csv ) DO CMD.EXE /C checklog.bat %%x %%ZIP%%

ECHO Emailing: %ZIP%

REM Attachment path (%ZIP%) cannot be relative

CSCRIPT ..\sendmail.vbs (TBD@TBD.COM) “Alert! Data Load Succeeded.” “The Data Loader was able to push records to Salesforce.” “%ZIP%”

CD ..

:Done

ECHO Archive process complete.



The second CHECKLOG call to the success folder is optional. When left in place, the system will email the success and error logs separately, in case the administrator wants assurance that the system is running successfully on schedule.


Another option would be to check for certain types of fatal errors. One fatal case would be someone renaming a target field in Salesforce without updating the Apex Data Loader configuration. If the mapping file refers to a field that no longer exists, the Data Loader generates a zero-length success log (with not even a header row). The CHECKFATAL.BAT alternative only ZIPs for emailing any success logs that are totally empty.


@ECHO OFF

REM CHECKFATAL.BAT

REM Count the lines and send only fatal errors, if any. Only use in Success folder.

ECHO CountLn: %1

CSCRIPT ..\countln.vbs %1

IF ERRORLEVEL 2 ERASE %1

IF EXIST %1 ..\7z.exe a %2 %1

ECHO Check fatal process complete.


The final touch is to prune the logs every so often. The PRUNELOGS.BAT batch file can be run once a month, on the first of the month. Based on the date represented by the file name, PRUNELOGS.BAT retains 2-3 months of logs, based on the current month, and deletes the rest.




@ECHO OFF
REM PRUNELOGS.BAT

REM ZIP delete the archive for month before last (retain 2-3 months of logs)

REM Intended to be called on the 1st day of each month

SET MM %date:~-10,2

IF “%MM%“ == “01” ERASE LOGS-20??-11-??.ZIP

IF “%MM%“ == “02” ERASE LOGS-20??-12-??.ZIP

IF “%MM%“ == “03” ERASE LOGS-20??-01-??.ZIP

IF “%MM%“ == “04” ERASE LOGS-20??-02-??.ZIP

IF “%MM%“ == “05” ERASE LOGS-20??-03-??.ZIP

IF “%MM%“ == “06” ERASE LOGS-20??-04-??.ZIP

IF “%MM%“ == “07” ERASE LOGS-20??-05-??.ZIP

IF “%MM%“ == “08” ERASE LOGS-20??-06-??.ZIP

IF “%MM%“ == “09” ERASE LOGS-20??-07-??.ZIP

IF “%MM%“ == “10” ERASE LOGS-20??-08-??.ZIP

IF “%MM%“ == “11” ERASE LOGS-20??-09-??.ZIP

IF “%MM%“ == “12” ERASE LOGS-20??-10-??.ZIP



With these patches in place, the Apex Data Loader can be a safe and reliable way to automatically import data files generated by another system. Though, no data integration system is foolproof, and you should still devise a custom validation report that runs every day to confirm that data is flowing into the system as expected.




While it would be wonderful if we could originate and retain all of our enterprise data only in Salesforce CRM, in real life, we still need to integrate with other systems, and, when we do, tools like the Apex Data Loader, with a pinch of salt, can help us get the job done.

For more Data Loader tips, drop by the blog Using the Apex Data Loader with Salesforce.

Salesforce – Open Source Pioneer



Before joining NimbleUser, I spent a decade working with open source software and participating in open source communities. Being part of an open source community, and helping to improve tools we use every day, is an enlightening experience, outside the scope of simple blog entry.

And I miss it. I’ve tried to stay involved with new projects like Vosao CMS, I answer mail sent to human-response at apache-dot-org, I speak at ApacheCon when I can, and I’m trying to bootstrap an advocacy site with Tim O’Brien. But, it’s a struggle to keep up with other projects when I’m still learning so much every day about the Salesforce CRM platform.

One consolation is that Salesforce is built with many products I touched during my open source decade. I worked with Doug Cutter and Jon Stevens to migrate Lucene to Apache Jakarta. Salesforce uses several Apache Commons products that I helped create (along with the Commons itself). So, although I don’t get to work on open source projects as much as I used to, I am grateful to be working with a solid, thoughtful product built on a backbone of truly awesome open source components

Core infrastructure

Salesforce.com pioneered cloud computing. Its combined Software-as-a-Service and Platform-as-a-Service offering handles 100,000+ customers, over 2.1 million users, 30+ million lines of third-party code, and hundreds of terabytes of data, running on its own blend of open source, proprietary, and custom code. [Wikipedia]

While the Salesforce platform uses the Oracle Database system as the basis as its backend data store, the web infrastructure is powered by several open source products, like Caucho Resin. The Resin Application Server is a high-performance XML application server for use with JSPs, servlets, JavaBeans, XML, and a host of other technologies. [Hurwitz]

To help users find data quickly and easily, in 2002, Salesforce.com added the open source Apache Lucene search engine to its technology stack. Today, Lucene to manages an eight terabyte index of about twenty billion documents on the Salesforce platform. The Salesforce/Lucene cluster consists of roughly sixteen machines, which in turn contain many small (sharded) Lucene indexes. The system handles roughly 4000 queries per second, and it provides an incremental indexing model where the new Salesforce user data is searchable within approximately three minutes, through use of a custom optimizer. [Igvita]

Quality assurance

Salesforce.com brings out three major releases a year, and to keep development on track, their quality assurance team runs 50,000 tests a day. About 20,000 of those tests run under the open source Selenium web application testing system. Selenium supports exporting tests to native application languages, like Java and C#, and a Salesforce.com engineer open sourced an extension for its own language, Apex. [Porro, Nagata]

User-facing tools

Several of the Salesforce external data manipulation tools are built with open source components or are themselves open source. The indispensible Salesforce IDE is an Eclipse plugin. The popular Force.com Excel Connector is hosted at code.google.com. Responses to Outbound Messages are handled by the Apache Commons Connection Pool. The very useful Apex Data Loader is powered by the Spring Framework and Apache Commons BeanUtils. Since the Data Loader is open source, an OSX version sprang up (LexiLoader), and there is also an open source wizard (DataLoaderCliq) to help with Data Loader automation.

Advocacy

At its 2011 Dreamforce Conference, Salesforce.com offered an entire track of presentations that encouraged users to adopt other open source tools, including a developer session entitled “Build Better Apps with Open Source”. On an ongoing basis, Salesforce.com hosts an Open Source forum as part of its support site. [SFDC]

By leveraging a solid core of open source packages, and mixing in its own proprietary customizations, Salesforce is able to deliver a robust, configurable platform, upon which millions of business users rely every day.

Products mentioned

References

  1. [Hurwitz] Is there beef behind SalesForce.Com?
  2. [Wikipedia] Salesforce.com.
  3. [Igvita] Lucene in the wild: Salesforce, LinkedIn, Twitter, et al.
  4. [Porro, Nagata] How to test Salesforce Apps with Selenium.
  5. [SFDC] Apex Data Loader, Build Better Apps Using Open Source, Open Source Labs tract, Open Source Forum.

Sun goes Nova. Cloud intact.


The Internet was designed to survive a nuclear holocaust, but no one expected to find social media networks still available after the Sun’s surprise flare-up on Saturday.

“While we talk about the bits being in the cloud, most of the Internet is actually underground,” explained Internet Guru Nick Francesco. “Although the solar flare burned off the ozone, condemning us all to a slow painful death, the good news is that we can still tweet about it.”


Saturday’s Twitter volume during the flare eclipsed the recent record set during the season finale of Pretty Little Liars. The light from the flare made earthfall at 7:48a EST, had its own Twitter account (@SunFlare) by 7:50a, and 439,624 followers by 8:15a. By noon, over 700,000 tweets referenced the flare, including choice nuggets like


@Ecc19 “Wow! Something new over the earth.”

@SmileKid13 “#WorstDayEver #SunGoesViral.”

@Groupon “Designer Sunglasses - B1G1 - today only!

@JStorm “Flame On!”


Elsewhere, Facebook exploded with blurry iPhone photos of a dilating ball of light, and most LinkedIn statuses changed to “Ready to move. Call me.”. The flare passed through Google+ without a ripple, since most of the plussies were busy updating their last known address on Plaxo.


Salesforce expects Chatter volume to soar on Monday, as burn-outs return to work after a long, hairy weekend. “The flare is a real opportunity for growth,” blogged founder Marc Benioff. “Nothing brings out net-new leads like a good fire sale.”


Catch me on the flip-side: Responding to Outbound Messages in Salesforce CRM


In a previous blog, we walked through example code for an Outbound Message Listener. We covered the essentials, but Salesforce CRM offers a nifty twist on just sending an outbound message to an external system. Not only can a system receive a message, it can send back a response. For example, if the outbound message inserts a record, the external system can send back the record’s key.

Our example Listener called a worker method that handled the messages in a foreach loop. Using the case of returning an external key, we can add the keys to a hashtable, and send back a batch at the end all of the responses. We’ll pickup the live action towards the end of the foreach loop from the previous example.

 Hashtable responsePayload = newHashtable();  // Returns IDs to SF
foreach (AccountNotification message in n.Notification)
{
sf = message.sObject;

     // – Check cache –
// …
// – handle message –
// …
// Queue ID for return to SF
responsePayload.Add(sf.Id, my.Id);
} // end foreach loop
// – Return IDs to SF –
log.Info(“Info: Sending response to Salesforce …”);
Response response = new Response();

 log.Info(response.Send(“Account”, responsePayload));
return;
}

The Response class is a wrapper around a SForceService instance which returns a String we can use as a logging statement.

class Response
{
private SforceService service;
public Response() { }
public String Send(String objectName, Hashtable payload)
{
return this.Send(objectName, payload, “External_Id__c”);
}
public String Send(String objectName, Hashtable payload, String syncFieldName)
{
if (payload == null || payload.Count == 0) return

            “No records updated. Response.Send() not needed.”;
try
{
service = new SforceService();
LoginResult loginResult = service.login(
ConfigurationManager.AppSettings[“Salesforce.Username”],
ConfigurationManager.AppSettings[“Salesforce.Password”] +
ConfigurationManager.AppSettings[“Salesforce.Token”]
);
service.SessionHeaderValue = new SessionHeader();
service.SessionHeaderValue.sessionId = loginResult.sessionId;
service.Url = loginResult.serverUrl;
GetUserInfoResult userInfo = loginResult.userInfo;
List sObjects = new List();
sObject s = new sObject();
XmlDocument xmlDocument = new XmlDocument();
List elements;
XmlElement element;
foreach (String sfid in payload.Keys)
{
s = new sObject();
s.Id = sfid;
s.type = objectName;
elements = newList();
element = xmlDocument.CreateElement(syncFieldName);
element.InnerText = ht[sfid].ToString();
elements.Add(element);
s.Any = elements.ToArray();
sObjects.Add(s);
}
SaveResult[] saveResults = service.update(sObjects.ToArray());
String output = String.Empty;
foreach (SaveResult sr in saveResults)
{
if (sr.success)
{
output += String.Format(

“{0} Response Sync Success : sf.id=\”{1}\””, objectName, sr.id) + “\n”;
}
else
{
foreach (Error err in sr.errors)
{
output += String.Format(

“{0} Response Sync Error : sf.id=\”{1}\”, {2}”, objectName, sr.id, err.message) +

“\n”;
}
}
}
return output;
}
catch (Exception ex)
{
return ex.Message + “ — “ + ex.StackTrace;
}
}
}

The credentials and API service endpoint are declared in the Web.Config file.


https://test.salesforce.com/services/Soap/u/24.0

(In a production environment, replace “test.salesforce.com” with “www.salesforce.com".)

Some caveats are

(1) The response needs credentials to login, and you might need to setup a system account for the response.

(2) To keep the message from recursing, the Response User needs to have outbound messaging disabled, or be specifically excluded from the Outbound Messaging Workflow.

(3) If your listener implements a producer/consumer queue, the response may not be immediate. (If tens of thousands of records are being updated by a slow system, it could be an hour or more!).

(4) An outbound message response may update the record while it’s being viewed by another user. If the user makes another change without refreshing the record, Salesforce will decline the second change, and the human user will have to refresh and repeat.

By allowing a response to an outbound message, Salesforce gives us the opportunity to close the loop. For example, if a record is being inserted into an external system, the response could return the external ID, for use with future updates.

While not a “point-and-click” product, like the Apex Dataloader, the outbound messaging API provides developers all the hooks we need to create a full-service data connection.

In Data We Trust

Data is the lifeblood of a company. When data flows freely to those who need it most, business booms. But, should the flow stop, a business fails. According to the National Archives & Records Administration, after a major data loss or extended data center outage (of ten days or more), most companies shut down within a year of the disaster.

Despite the stakes, according to a national survey, many business do not have proper data backup.


40% of Small and Medium Sized Businesses don’t back up their data at all. Up to 50% of all backups are not fully recoverable and up to 60% of all backups fail in general.
Since 2003, the majority of North American workers are mobile. One-third of all computers sold are laptops, and 60% of all data is held on PC Desktops and laptops.
* 15% or more of laptop computers are stolen or suffer hard drive failures.
To avoid becoming a statistic, every day, more and more businesses are looking to cloud-based software-as-a-service vendors to provide a one-stop IT shop.

When workers have instant access to data and files through a web browser, people don’t feel the need to store key data on local machines. When data is centralized on the cloud, it can be backed up and maintained by the best and the brightest. Between Google Apps and Salesforce CRM, many professionals will have all the software they need, saving the cost of licensing and maintaining complex office suites. Today, cloud-based offices can securely access their data from any machine with an Internet connection and work seamlessly from any location.

Of course, when a company moves core data to the cloud, it must be just as careful when choosing a IT provider as it should be when choosing IT personnel. The Salesforce CRM platform offers businesses a number of key safeguards, including trust.salesforce.com, release trains, and superb test coverage.

trust.salesforce.com

Saleforce.com believes that trust starts with transparency, and provides real-time information on system performance and security on a public web site that anyone can visit at trust.salesforce.com. The System Status panel lists each of the Salesforce servers, flagging any current performance issues, and providing a RSS feed with historical information going back to Sep 2010. The trust.salesforce.com site also provides customers with background information on data integrity, security, and overall best practices.

Release Trains

Like clockwork, three times a year, Salesforce rolls out a major new release. All customers are upgraded automatically and seamlessly, at no additional cost, ensuring that everyone has all the latest fixes and improvements. The train model means that Salesforce.com can bring out new features incrementally. When the next train will be along in a couple of months, there’s no need to rush an under-developed feature to market. Certified administrators and developers must also pass a maintenance exam for every new release, assuring that key people are aware of the latest changes.

Test Coverage

Every day, Salesforce.com runs over 50,000 quality assurance tests on its own software. When Salesforce.com introduced the Apex customization language in 2009, mandatory test coverage was baked into the development model. Salesforce.com provides a platform based on high-quality, thoroughly tested code, and expects custom developers to do the same.

If you are ready for a safe, reliable, professionally-maintained business platform, Salesforce.com, the original Software as a Service provider, is still the best.