Spring Batch — Read from XML and write to Mongo

In this post, we will show you how to use Spring Batch to read an XML file with your ItemReader using StaxEventItemReader and write its data to NoSQL using Custom ItemWriter with JpaRepository. Here we’ve used MongoDB.

Custom ItemReader or ItemWriter is a class where we write our own way of reading or writing data. In Custom Reader we are required to handle the chunking logic as well. This comes in handy if our reading logic is complex and cannot be handled using Default ItemReader provided by spring.

Tools and libraries used:

Maven Dependency — Need to configure the project

	<properties>
<java.version>1.8</java.version>
<maven-jar-plugin.version>3.1.1</maven-jar-plugin.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-oxm</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>
<dependency>
<groupId>com.thoughtworks.xstream</groupId>
<artifactId>xstream</artifactId>
<version>1.4.7</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>

CustomerWriter — This is Custom writer we’ve created to write the customer data into mongodb. Custom writer gives capability to perform complex operations too.

import org.springframework.batch.item.ItemWriter;
import org.springframework.beans.factory.annotation.Autowired;
import com.example.domain.Customer;
import com.example.repository.CustomerRepository;
public class CustomerWriter implements ItemWriter<Customer>{
@Autowired
private CustomerRepository customerRepository;

@Override
public void write(List<? extends Customer> customers) throws Exception {
customerRepository.saveAll(customers);
}
}

CustomerRepository — This is mongo repository which talks with database.

public interface CustomerRepository extends MongoRepository<Customer, String>{
}

Customer — This is mongo document class which holds business data.

import java.time.LocalDate;
import javax.xml.bind.annotation.XmlRootElement;
import org.springframework.data.annotation.Id;
import org.springframework.data.mongodb.core.mapping.Document;
import org.springframework.data.mongodb.core.mapping.Field;
import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;
@AllArgsConstructor
@NoArgsConstructor
@Builder
@Data
@XmlRootElement(name = "Customer")
@Document
public class Customer {
@Id
private Long id;
@Field
private String firstName;
@Field
private String lastName;
@Field
private LocalDate birthdate;
}

CustomerConverter — We’ve implement the Converter interface. This class is used to Converter implementations are responsible marshalling Java objects to/from textual data. If an exception occurs during processing, ConversionException should be thrown. If working with the high level com.thoughtworks.xstream.XStream facade, you can register new converters using the XStream.registerConverter() method.

import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import com.example.domain.Customer;
import com.thoughtworks.xstream.converters.Converter;
import com.thoughtworks.xstream.converters.MarshallingContext;
import com.thoughtworks.xstream.converters.UnmarshallingContext;
import com.thoughtworks.xstream.io.HierarchicalStreamReader;
import com.thoughtworks.xstream.io.HierarchicalStreamWriter;
public class CustomerConverter implements Converter {
private static final DateTimeFormatter DT_FORMATTER = DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss");

@Override
public boolean canConvert(Class type) {
return type.equals(Customer.class);
}
@Override
public void marshal(Object source, HierarchicalStreamWriter writer, MarshallingContext context) {
// Don't do anything
}
@Override
public Object unmarshal(HierarchicalStreamReader reader, UnmarshallingContext context) {
reader.moveDown();
Customer customer = new Customer();
customer.setId(Long.valueOf(reader.getValue()));

reader.moveUp();
reader.moveDown();
customer.setFirstName(reader.getValue());

reader.moveUp();
reader.moveDown();
customer.setLastName(reader.getValue());

reader.moveUp();
reader.moveDown();
customer.setBirthdate(LocalDate.parse(reader.getValue(), DT_FORMATTER));

return customer;
}
}

JobConfiguration — This is main class responsible for performing the batch job. In this class we used various Beans to perform the individual task.

StaxEventItemReader — Item reader for reading XML input based on StAX. It extracts fragments from the input XML document which correspond to records for processing. The fragments are wrapped with StartDocument and EndDocument events so that the fragments can be further processed like standalone XML documents. The implementation is not thread-safe.

CustomerWriter- This is custom class which writes data to MongoDB.

step1 — This step configures ItemReader and ItemWriter, however ItemProcessor is optional step, which we’ve skip.

Job — Batch domain object representing a job. Job is an explicit abstraction representing the configuration of a job specified by a developer. It should be noted that restart policy is applied to the job as a whole and not to a step.

import java.util.HashMap;
import java.util.Map;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.item.xml.StaxEventItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.oxm.xstream.XStreamMarshaller;
import com.example.domain.Customer;
import com.example.writer.CustomerWriter;
@Configuration
public class JobConfiguration {
@Autowired
private JobBuilderFactory jobBuilderFactory;

@Autowired
private StepBuilderFactory stepBuilderFactory;
@Bean
public StaxEventItemReader<Customer> customerItemReader(){
Map<String, Class> aliases = new HashMap<>();
aliases.put("customer", Customer.class);

CustomerConverter converter = new CustomerConverter();
XStreamMarshaller ummarshaller = new XStreamMarshaller();
ummarshaller.setAliases(aliases);
ummarshaller.setConverters(converter);

StaxEventItemReader<Customer> reader = new StaxEventItemReader<>();
reader.setResource(new ClassPathResource("/data/customer.xml"));
reader.setFragmentRootElementName("customer");
reader.setUnmarshaller(ummarshaller);

return reader;
}

@Bean
public CustomerWriter customerWriter() {
return new CustomerWriter();
}

@Bean
public Step step1() throws Exception {
return stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(200)
.reader(customerItemReader())
.writer(customerWriter())
.build();
}

@Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job")
.start(step1())
.build();
}
}

application.properties

Customer.xml — This is sample data.

MainApp — SpringBatchMongodbApplication can be run as Spring Boot project.

import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration;
import org.springframework.data.mongodb.repository.config.EnableMongoRepositories;
@SpringBootApplication(exclude = {DataSourceAutoConfiguration.class})
@EnableBatchProcessing
@EnableMongoRepositories(basePackages = "com.example.repository")
public class SpringBatchMongodbApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBatchMongodbApplication.class, args);
}
}

Output:

/* 1 */
{
"_id" : NumberLong(1),
"firstName" : "John",
"lastName" : "Doe",
"birthdate" : ISODate("1988-10-09T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 2 */
{
"_id" : NumberLong(2),
"firstName" : "James",
"lastName" : "Moss",
"birthdate" : ISODate("1991-03-31T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 3 */
{
"_id" : NumberLong(3),
"firstName" : "Jonie",
"lastName" : "Gamble",
"birthdate" : ISODate("1982-07-20T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 4 */
{
"_id" : NumberLong(4),
"firstName" : "Mary",
"lastName" : "Kline",
"birthdate" : ISODate("1973-08-06T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 5 */
{
"_id" : NumberLong(5),
"firstName" : "William",
"lastName" : "Lockhart",
"birthdate" : ISODate("1994-04-03T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 6 */
{
"_id" : NumberLong(6),
"firstName" : "John",
"lastName" : "Doe",
"birthdate" : ISODate("1988-10-09T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 7 */
{
"_id" : NumberLong(7),
"firstName" : "Kristi",
"lastName" : "Dukes",
"birthdate" : ISODate("1983-09-16T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 8 */
{
"_id" : NumberLong(8),
"firstName" : "Angel",
"lastName" : "Porter",
"birthdate" : ISODate("1980-12-14T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 9 */
{
"_id" : NumberLong(9),
"firstName" : "Mary",
"lastName" : "Johnston",
"birthdate" : ISODate("1987-07-06T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 10 */
{
"_id" : NumberLong(10),
"firstName" : "Linda",
"lastName" : "Rodriguez",
"birthdate" : ISODate("1991-09-15T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 11 */
{
"_id" : NumberLong(11),
"firstName" : "Phillip",
"lastName" : "Lopez",
"birthdate" : ISODate("1965-12-17T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}
/* 12 */
{
"_id" : NumberLong(12),
"firstName" : "Peter",
"lastName" : "Dixon",
"birthdate" : ISODate("1996-05-08T18:30:00.000Z"),
"_class" : "com.example.domain.Customer"
}

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store