Batch Processing and Stream Processing (Async & Sync Messaging)

  • Takes a large amount of input data and uses a background job to process them in a batch, and then sends it out maybe the next day
  • Fault tolerance—failed tasks will be recalled and re-executed
  • Idempotent operations—multiple executions of a job only perform once.
  • Common examples: backup systems, compressing logs
  • Split petabytes of data into smaller chunks for a specific task, and distribute those chunks in parallel processing among server machines
  • Map, Shuffle and Reduce
  • If a job failed simply batch processor restart that job.
  • If a job is getting too big, we can create small parts called micro-batches
  • Asynchronous communication
  • Synchronous communication
  • Batch Processing: convert the target data file into a set of records, a data can be created by multiple jobs.
  • Stream Processing: convert the target data file into a set of events, events are created by one single producer (publisher).
  • Use checkpoint ID or use a time to label stream jobs and their order
  • If a job fails, reverse back to checkpoints
  • Asynchronous Messaging — queue up a certain amount of responses and then send it in a batch, publisher will produce a data that is put in a message bus, subscribers are not directly getting the message from the publisher, instead, get the message from a message bus
  • Drop Message (best way to deal with video data is to drop some packets)
  • Buffer Messages in a Queue
  • Flow Control (when the slow consumer can not cope with the faster producer, simply block, can not receive any more data)
  • Load balancing — add more receivers to process messages in parallel
  • Fan-out —broadcast to several receivers without affecting each other
  • Communication between two software components that one component is waiting for another component to respond.
  • Example: email and web messages
  • Packet-based messaging (UDP — User Datagram Protocol for data transfer)
  • Stream-based messaging (TCP — Transfer Control Protocol for data transfer)
  • UDP is much faster than TCP
  • UDP has no guarantee of all message delivery => good for distributing videos
  • The first step is to create a server socket Socket socket = serversocket.accept();
  • After creating the socket, the server should start to wait for the connection Socket socket = new Socket(servername, port);
  • The client should request a server for a connection with the server name and port ServerSocket serversocket = new ServerSocket(9000);

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Hanwen Zhang

Hanwen Zhang

Full-Stack Software Engineer at a Healthcare Tech Company | Document My Coding Journey | Improve My Knowledge | Share Coding Concepts in a Simple Way