Spring Cloud Data Flow (SCDF) is a toolkit for building data integration and real-time data processing pipelines. The SCDF server uses Spring Cloud Deployer, to deploy data pipelines onto modern runtimes such as Cloud Foundry (PCF).

In this tutorial, let’s create a simple http|log stream which consumes payload over HTTP and prints it. We shall use an out-of-the-box http application which is a REST service which consumes the data and pushes it to the queue. We shall use out-of-the-box log application which consumes the data from queue and prints it to the log file. We need Spring Cloud Data Flow Server (SCDF) for creating and deploying the stream to cloudfoundry which we shall download and install as well, as part of this tutorial.

Table of contents

  1. Prerequisites
  2. Add Services from PCF Marketplace
  3. Download and Deploy SCDF to PCF
  4. Download Spring Cloud Dataflow Shell Application
  5. Create and deploy a sample http|log Stream
  6. Test the Stream
  7. Summary

1. Prerequisites

  • JDK 8
  • An account on Pivotal Cloud Foundry (PCF). You can create one here
  • PCF Command Line Interface (CLI) installed on your computer. PCF CLI can be found in tools section of your PCF account

2. Add Services from PCF Marketplace

SCDF Server needs redis, rabbitmq and mysql services. Let’s create them before we install SCDF Server.

Log into your PCF account using cf command

Replace <email>, <password>, <org> and <space> with values specific to your cloudfoundry account.

$ cf login -a api.run.pivotal.io -u "<email>" -p "<password>"  -o "<org>" -s "<space>"

API endpoint: api.run.pivotal.io
Targeted org <org>
Targeted space <space>

API endpoint:   https://api.run.pivotal.io (API version: 2.128.0)
User:           <email>
Org:            <org>
Space:          <space>

Add the required services from marketplace for SCDF Server

  • SCDF server needs a valid Redis store for its analytic repository.
  • It also needs an RDBMS for storing stream/task definitions, application registration, and for job repositories.
  • RabbitMQ is used as a messaging middleware between streaming apps and is bound to each deployed streaming app. Kafka is other option. Let’s stick with rabbit for this tutorial purposes.

For the above mentioned purposes, let’s create rabbitmq, redis and mysql services from marketplace, using the below cf commands.

$ cf create-service cloudamqp lemur my_rabbit
Creating service instance my_rabbit in org <org> / space <space> as <email>...

$ cf create-service rediscloud 30mb my_redis
Creating service instance my_redis in org <org> / space <space> as <email>...

$ cf create-service cleardb spark my_mysql
Creating service instance my_mysql in org <org> / space <space> as <email>...

Validate that all the 3 services are created successfully

$ cf services
Getting services in org <org> / space <space> as <email>....

name        service      plan    bound apps   last operation
my_mysql    cleardb      spark                create succeeded
my_rabbit   cloudamqp    lemur                create succeeded
my_redis    rediscloud   30mb                 create succeeded

3. Download SCDF Server and deploy it to cloudfoundry

Download SCDF server jar file for cloudfoundry

Let’s download spring-cloud-dataflow-server-cloudfoundry-1.7.3.RELEASE.jar jar file from Spring repo using wget command.

$ wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-server-cloudfoundry/1.7.3.RELEASE/spring-cloud-dataflow-server-cloudfoundry-1.7.3.RELEASE.jar

Version 1.7.3.RELEASE was the latest one during this tutorial creation.

Let’s provide configuration details like credentials to the Cloud Foundry instance so that the SCDF Server can itself spawn applications. Let’s specify these configuration details in manifest.yml file.

- name: data-flow-server
  random-route: true
  memory: 2G
  disk_quota: 2G
  instances: 1
  path: spring-cloud-dataflow-server-cloudfoundry-1.7.3.RELEASE.jar
  - route: codeaches-scdf-server.cfapps.io
    SPRING_APPLICATION_NAME: data-flow-server
    SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL: https://api.run.pivotal.io
    SPRING_APPLICATION_JSON: '{"maven":{"remote-repositories":{"repo1":{"url":"https://repo.spring.io/libs-release"},"repo2":{"url":"https://oss.sonatype.org/content/repositories/snapshots"},"repo3":{"url":"https://oss.sonatype.org/content/repositories/snapshots"}}}}'
    security.basic.enabled: true
    security.user.name: user001
    security.user.password: pass001
    security.user.role: VIEW,CREATE,MANAGE
    - my_mysql
    - my_redis

You need to replace {org}, {space}, {email} and {password} with values specific to your cloudfoundry account. You will also need to replace codeaches with the name of your choice.
Note that the basic security is enabled for SCDF server.

Deploy SCDF server jar file to cloudfoundry

Deploy spring-cloud-dataflow-server-cloudfoundry-1.7.3.RELEASE.jar to PCF using the cf push command.

$ cf push -f manifest.yml

Validate the SCDF server deployment

Verify the SCDF Server deployment status on cloudfoundry.

$ cf apps
Getting apps in org <org> / space <space> as as <email>...

name               requested state   instances   memory   disk   urls
data-flow-server   started           1/1         2G       2G     codeaches-scdf-server.cfapps.io

4. Download Spring Cloud Dataflow Shell

Spring Cloud Dataflow Shell is a command line interface (CLI) which can be used to connect to SCDF Server. We shall use this CLI to deploy streams.

Let’s download the SCDF shell jar file using wget command.

$ wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/1.7.3.RELEASE/spring-cloud-dataflow-shell-1.7.3.RELEASE.jar

Connect to SCDF Server from SCDF shell

Start the spring-cloud-dataflow-shell spring boot application.

$ java -jar spring-cloud-dataflow-shell-1.7.3.RELEASE.jar
  ____                              ____ _                __
 / ___| _ __  _ __(_)_ __   __ _   / ___| | ___  _   _  __| |
 \___ \| '_ \| '__| | '_ \ / _` | | |   | |/ _ \| | | |/ _` |
  ___) | |_) | |  | | | | | (_| | | |___| | (_) | |_| | (_| |
 |____/| .__/|_|  |_|_| |_|\__, |  \____|_|\___/ \__,_|\__,_|
  ____ |_|    _          __|___/                 __________
 |  _ \  __ _| |_ __ _  |  ___| | _____      __  \ \ \ \ \ \
 | | | |/ _` | __/ _` | | |_  | |/ _ \ \ /\ / /   \ \ \ \ \ \
 | |_| | (_| | || (_| | |  _| | | (_) \ V  V /    / / / / / /
 |____/ \__,_|\__\__,_| |_|   |_|\___/ \_/\_/    /_/_/_/_/_/


Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".

Connect to SCDF Server using the route generated by cloudfoundry for the SCDF server. We will need to pass credentials of SCDF server as we have enabled the authentication.

server-unknown:>dataflow config server --uri "https://codeaches-scdf-server.cfapps.io" --username "user001" --password "pass001" --skip-ssl-validation "true"

Shell mode: classic, Server mode: classic

5. Create a http|log Stream

Cloudfoundry provides with few out-of-the-box source and sink spring boot applications which can be used for stream creation. Lets register the out-of-the-box http and log spring boot apps, specific to rabbit messaging broker, in SCDF server.

dataflow:>app register --name http --type source --uri maven://org.springframework.cloud.stream.app:http-source-rabbit:2.0.3.RELEASE
Successfully registered application 'source:http'

dataflow:>app register --name log --type sink --uri maven://org.springframework.cloud.stream.app:log-sink-rabbit:2.0.2.RELEASE
Successfully registered application 'sink:log'

Let’s utilize the above registered apps http and log to create http|log stream. This stream, httpLogStream, will take HTTP POST request and prints the body in log file.

dataflow:>stream create --name httpLogStream --definition "http | log" --deploy
Created new stream 'httpLogStream'
Deployment request has been sent

Once the stream creation and deployment is successful, PCF creates random routes (urls) for both log and sink applications which can be validated using cf apps command.

$ cf apps
Getting apps in org org <org> / space <space> as <email>

name                                          requested state   instances   memory   disk   urls
data-flow-server-hd6lIb0-httpLogStream-http   started           1/1         1G       1G     data-flow-server-hd6lIb0-httpLogStream-http.cfapps.io
data-flow-server-hd6lIb0-httpLogStream-log    started           1/1         1G       1G     data-flow-server-hd6lIb0-httpLogStream-log.cfapps.io

6. Test the Stream

Tail the log of log application

Tail the log of data-flow-server-hd6lIb0-httpLogStream-log application using cf command.

cf logs data-flow-server-hd6lIb0-httpLogStream-log
Retrieving logs for app data-flow-server-hd6lIb0-httpLogStream-log in org <org> / space <space> as <email>...

Post a sample message to the stream

Post a sample hello world message to http application using the route data-flow-server-hd6lIb0-httpLogStream-http.cfapps.io as shown below. The message will be picked up by http app and passed to log application.

$ curl -i -H "Content-Type:application/text" -X POST -d 'hello world' https://data-flow-server-hd6lIb0-httpLogStream-http.cfapps.io

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    15    0     0  100    15      0     21 --:--:-- --:--:-- --:--:--    21HTTP/1.1 202 Accepted
Date: Wed, 16 Jan 2019 04:43:55 GMT
X-Vcap-Request-Id: f0282b62-c09f-4c23-4e90-0f374ba2cca9
Content-Length: 0
Connection: keep-alive

Once the message is posted successfully, hello world will be printed in the logs of log application.

2019-01-16T06:39:43.77-0700 [APP/PROC/WEB/0] OUT 2019-01-16 13:39:43.758  INFO 14 --- [httpLogStream-1] ta-flow-server-hd6lIb0-httpLogStream-log : hello world

7. Summary

Congratulations! You just deployed a stream on PCF using SCDF Server and created a http|log stream.