Spring Batch — read XML

Prateek
5 min readDec 28, 2019

In this post, we will show you how to use Spring Batch to read an XML file with your ItemReader using StaxEventItemReader and write its data to MySQL using Custom ItemWriter with JpaRepository.

Custom ItemReader or ItemWriter is a class where we write our own way of reading or writing data. In Custom Reader we are required to handle the chunking logic as well. This comes in handy if our reading logic is complex and cannot be handled using Default ItemReader provided by spring.

Tools and libraries used:

Maven 3+
JDK 1.8
Spring Boot 2.2.2.RELEASE
Spring Data Jdbc
Spring OXM
MySQL
xstream

Maven Dependency — I have used latest version of Spring Boot as of today, however this parent version will automatically pull the latest version we used in dependency section.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.2.2.RELEASE</version>
<relativePath /> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>readingXML</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>readingXML</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
<maven-jar-plugin.version>3.1.1</maven-jar-plugin.version>
</properties>
<dependencies>
<!-- Spring Starter Batch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<!-- Spring OXM -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-oxm</artifactId>
</dependency>
<!-- Spring Starter JDBC -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<!-- H2 -->
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
<!-- MYSQL -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<scope>runtime</scope>
</dependency>
<!-- XSTREAM -->
<dependency>
<groupId>com.thoughtworks.xstream</groupId>
<artifactId>xstream</artifactId>
<version>1.4.7</version>
</dependency>
<!-- Lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>

CustomerConverter — We’ve implement the Converter interface. This class is used to Converter implementations are responsible marshalling Java objects to/from textual data. If an exception occurs during processing, ConversionException should be thrown. If working with the high level com.thoughtworks.xstream.XStream facade, you can register new converters using the XStream.registerConverter() method.

package com.example.config;import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import com.example.domain.Customer;
import com.thoughtworks.xstream.converters.Converter;
import com.thoughtworks.xstream.converters.MarshallingContext;
import com.thoughtworks.xstream.converters.UnmarshallingContext;
import com.thoughtworks.xstream.io.HierarchicalStreamReader;
import com.thoughtworks.xstream.io.HierarchicalStreamWriter;
public class CustomerConverter implements Converter {
private static final DateTimeFormatter DT_FORMATTER = DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss");

@Override
public boolean canConvert(Class type) {
return type.equals(Customer.class);
}
@Override
public void marshal(Object source, HierarchicalStreamWriter writer, MarshallingContext context) {
// Don't do anything
}
@Override
public Object unmarshal(HierarchicalStreamReader reader, UnmarshallingContext context) {
reader.moveDown();
Customer customer = new Customer();
customer.setId(Long.valueOf(reader.getValue()));

reader.moveUp();
reader.moveDown();
customer.setFirstName(reader.getValue());

reader.moveUp();
reader.moveDown();
customer.setLastName(reader.getValue());

reader.moveUp();
reader.moveDown();
customer.setBirthdate(LocalDate.parse(reader.getValue(), DT_FORMATTER));

return customer;
}
}

MyJobConfiguration — This class will help to avoid creating the Spring Batch metadata tables as we don’t need to actually trigger job which is dependent on database.

package com.example.config;
import org.springframework.batch.core.configuration.annotation.DefaultBatchConfigurer;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class MyJobConfiguration extends DefaultBatchConfigurer {
@Override
protected JobRepository createJobRepository() throws Exception {
MapJobRepositoryFactoryBean factoryBean = new MapJobRepositoryFactoryBean();
factoryBean.afterPropertiesSet();
return factoryBean.getObject();
}
}

Customer — This class is JPA Entity class.

package com.example.domain;
import java.time.LocalDateTime;import javax.persistence.*;
import javax.xml.bind.annotation.XmlRootElement;
import lombok.*;
@AllArgsConstructor
@NoArgsConstructor
@Builder
@Data
@XmlRootElement(name = "Customer")
@Entity
public class Customer {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String firstName;
private String lastName;
private LocalDateTime birthdate;
}

JobConfiguration — This is main class responsible for performing the batch job. In this class we used various Beans to perform the individual task.

StaxEventItemReader — Item reader for reading XML input based on StAX. It extracts fragments from the input XML document which correspond to records for processing. The fragments are wrapped with StartDocument and EndDocument events so that the fragments can be further processed like standalone XML documents. The implementation is not thread-safe.

ItemWriter — This is custom ItemWriter used to simply write the data in MySQL. Here we have complete control on the way to write the data.

step1 — This step configures ItemReader and ItemWriter, however ItemProcessor is optional step, which we’ve skip.

Job — Batch domain object representing a job. Job is an explicit abstraction representing the configuration of a job specified by a developer. It should be noted that restart policy is applied to the job as a whole and not to a step.

package com.example.config;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;
import javax.sql.DataSource;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.database.ItemPreparedStatementSetter;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.xml.StaxEventItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.oxm.xstream.XStreamMarshaller;
import com.example.domain.Customer;
@Configuration
public class JobConfiguration {
@Autowired
private JobBuilderFactory jobBuilderFactory;

@Autowired
private StepBuilderFactory stepBuilderFactory;

@Autowired
private DataSource dataSource;

@Bean
public StaxEventItemReader<Customer> customerItemReader(){
Map<String, Class> aliases = new HashMap<>();
aliases.put("customer", Customer.class);

CustomerConverter converter = new CustomerConverter();
XStreamMarshaller ummarshaller = new XStreamMarshaller();
ummarshaller.setAliases(aliases);
ummarshaller.setConverters(converter);

StaxEventItemReader<Customer> reader = new StaxEventItemReader<>();
reader.setResource(new ClassPathResource("/data/customer.xml"));
reader.setFragmentRootElementName("customer");
reader.setUnmarshaller(ummarshaller);

return reader;
}

@Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(200)
.reader(customerItemReader())
.writer(writer())
.build();
}

@Bean
public Job job() {
return jobBuilderFactory.get("job")
.start(step1())
.build();
}

@Bean
public JdbcBatchItemWriter<Customer> writer() {
JdbcBatchItemWriter<Customer> writer = new JdbcBatchItemWriter<>();
writer.setDataSource(this.dataSource);
writer.setSql("INSERT INTO customer (id, birthdate, first_name, last_name) VALUES (?,?,?,?)");
writer.setItemPreparedStatementSetter(new CustomerItemPreparedStmSetter());
return writer;
}

private class CustomerItemPreparedStmSetter implements ItemPreparedStatementSetter<Customer> {
public void setValues(Customer result, PreparedStatement ps) throws SQLException {
ps.setLong(1, result.getId());
ps.setDate(2, java.sql.Date.valueOf( result.getBirthdate() ));
ps.setString(3, result.getFirstName());
ps.setString(4, result.getLastName());
}
}
}

application.properties — The configurations for connecting to database.

spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.url=jdbc:mysql://localhost:3306/test
spring.datasource.username=root
spring.datasource.password=root

spring.jpa.hibernate.ddl-auto=create
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.format_sql=true

CustomerRepository — This is JpaRepostory class responsible for talking to database.

package com.example.repository;import org.springframework.data.jpa.repository.JpaRepository;import com.example.domain.Customer;
public interface CustomerRepository extends JpaRepository<Customer, Long>{
}

Customer.xml — This is sample data to load into DB.

<?xml version="1.0" encoding="UTF-8" ?>
<customers>
<customer>
<id>1</id>
<firstName>John</firstName>
<lastName>Doe</lastName>
<birthdate>10-10-1988 19:43:23</birthdate>
</customer>
<customer>
<id>2</id>
<firstName>James</firstName>
<lastName>Moss</lastName>
<birthdate>01-04-1991 10:20:23</birthdate>
</customer>
<customer>
<id>3</id>
<firstName>Jonie</firstName>
<lastName>Gamble</lastName>
<birthdate>21-07-1982 11:12:13</birthdate>
</customer>
<customer>
<id>4</id>
<firstName>Mary</firstName>
<lastName>Kline</lastName>
<birthdate>07-08-1973 11:27:42</birthdate>
</customer>
<customer>
<id>5</id>
<firstName>William</firstName>
<lastName>Lockhart</lastName>
<birthdate>04-04-1994 04:15:11</birthdate>
</customer>
<customer>
<id>6</id>
<firstName>John</firstName>
<lastName>Doe</lastName>
<birthdate>10-10-1988 19:43:23</birthdate>
</customer>
<customer>
<id>7</id>
<firstName>Kristi</firstName>
<lastName>Dukes</lastName>
<birthdate>17-09-1983 21:22:23</birthdate>
</customer>
<customer>
<id>8</id>
<firstName>Angel</firstName>
<lastName>Porter</lastName>
<birthdate>15-12-1980 18:09:09</birthdate>
</customer>
<customer>
<id>9</id>
<firstName>Mary</firstName>
<lastName>Johnston</lastName>
<birthdate>07-07-1987 19:43:03</birthdate>
</customer>
<customer>
<id>10</id>
<firstName>Linda</firstName>
<lastName>Rodriguez</lastName>
<birthdate>16-09-1991 09:13:43</birthdate>
</customer>
<customer>
<id>11</id>
<firstName>Phillip</firstName>
<lastName>Lopez</lastName>
<birthdate>18-12-1965 11:10:09</birthdate>
</customer>
<customer>
<id>12</id>
<firstName>Peter</firstName>
<lastName>Dixon</lastName>
<birthdate>09-05-1996 19:09:23</birthdate>
</customer>
</customers>

MainApp — Run this as Spring Boot Application.

package com.example;import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
@EnableBatchProcessing
public class ReadingXmlApplication {
public static void main(String[] args) {
SpringApplication.run(ReadingXmlApplication.class, args);
}
}

Result:

As you can see XML data has been loaded into MySQL database. Showing you the details for reference.

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| customer |
+----------------+
1 row in set (0.00 sec)
mysql> select * from customer;
+----+------------+------------+-----------+
| id | birthdate | first_name | last_name |
+----+------------+------------+-----------+
| 1 | 1988-10-10 | John | Doe |
| 2 | 1991-04-01 | James | Moss |
| 3 | 1982-07-21 | Jonie | Gamble |
| 4 | 1973-08-07 | Mary | Kline |
| 5 | 1994-04-04 | William | Lockhart |
| 6 | 1988-10-10 | John | Doe |
| 7 | 1983-09-17 | Kristi | Dukes |
| 8 | 1980-12-15 | Angel | Porter |
| 9 | 1987-07-07 | Mary | Johnston |
| 10 | 1991-09-16 | Linda | Rodriguez |
| 11 | 1965-12-18 | Phillip | Lopez |
| 12 | 1996-05-09 | Peter | Dixon |
+----+------------+------------+-----------+
12 rows in set (0.00 sec)
mysql>

--

--