Announcing WebSemaphore V1 and new features

1. Intro

We are pleased to release the next version of WebSemaphore. Sequentially, it should be called v2 but since the previous version had an incomplete feature set it was more appropriate to call it Beta. The current version is fully featured, which makes it v1.

Key changes in this release:

Fine-grained job state control
High-resolution automated job timeouts
Job archive
Improved UI

Below we introduce details on each of the key changes and conclude with a brief example to show how they serve a real-life scenario.

2. Fine-grained job state control

The Beta included a way to acquire a resource and pass control to a processor via an http/websocket callback, limiting concurrent throughput as set in the semaphore configuration. However, the only action that could be taken from there was to release the semaphore once the job is done.
This presented multiple challenges for customer setups, and we set on a refactoring mission to help them.

Job states

The job states fall into three groups: transient, final and archived. Table 1.1 describes each state and the actions that a client/processor can take to transition a job to another state.

	Status	Explanation	Possible actions
Transient	scheduled	Accepted and waiting for resource acquisition	delete
	sending	Acquired resource, waiting to be sent	[internal short-lived state, no actions on user side]
	inflight	Mapping executed and job was sent to processor	release, cancel, delete
Final	done	Job was completed and semaphore released by processor	requeue, delete
	error	Job failed during mapping or callback	requeue, reschedule, delete
	timeout	Job was sent to processor but not released before timeout	requeue, reschedule, release, delete
Archived	archived	The job is considered deleted and will be moved to an archive storage or disposed of

Table 1.1 - Job Statuses mapped to actions available to clients

State control API

There is no direct way for clients to change a job’s status. Instead, actions are available that apply only to jobs in a corresponding status. Each action will transition the job to a specific status according to the next table.

Action	From Status	To Status	Explanation
acquire		scheduled	Schedule a new job by client
requeue	done, error, timeout	scheduled	Reprocess with a new message id at the end of the channel queue
reschedule	timeout, error	scheduled	Reprocess preserving message id and in-channel order
release	inflight, timeout	done	Set status to done. If in flight will release the lock
cancel	inflight	archived	Move to archived state, keep original state in lastState
delete	all except archived

Table 1.2 - Actions available to client mapped to Job Statuses

To visualize the above dry information, below is a state diagram. The black and green arrows represent the happy path that was available in the beta version. The purple arrows represent retry actions (requeue, reschedule); The yellow arrows represent archive/delete transitions. Finally, the red and lime arrows represent error and timeout transitions that are not available to clients directly, but are useful for understanding the job state space. Note the color codes of the transition arrows correspond to the colors in the Actions column in Table 1.2.

Diagram 1.3 - Actions mapped to Job Statuses

High-resolution automated job timeouts

Detailed job tracking allows for the implementation of this highly requested feature. While simple in principle, it requires all of the job state machinery described above to function.

A timer is a simple setting on a semaphore that defines the maximum time a job can be in “inflight” state. If the processor does not release the job during that time, the job will be transitioned to the “timeout” state. It is then up to the configuration of the semaphore to decide whether to:

Deactivate the channel
Deactivate the semaphore
Drop (skip) the job and continue processing

The choice will depend on the use case.

Notes:

The timeout setting is in seconds, however the actual precision of the timer is currently at about 20 seconds, meaning a timeout will occur no earlier than the configured number of seconds passed but can be up to 20 seconds late to execute. This is expected to be improved based on demand.
[TO COMPLETE!] Note that the mapping is capable of detecting “poisoned pills” and drop messages via custom routing. WHICH STATE SUCH MESSAGES WILL END UP IN?!!!
Archive

In high-traffic situations It’s neither practical nor economically viable to keep a full log of all the jobs ever received in the active storage. Additionally, it’s desirable to delete irrelevant messages altogether.
The archived state of a message can be thought of as a recycle bin of an operating system. In the next releases it will be possible to configure the period after which jobs in the done state will automatically be moved to the archive or deleted. We expect the archive to be billed as a long-term grade storage, and are happy to hear more customer feedback on the future of this direction.
Improved UI

The UI was improved for overall better UX and to accommodate the new features. Most notably, it is now possible to track individual jobs and act on them, in accordance with the rules laid out in parts 2 and 3 above.

Kanban View

This view is expected to be most useful for real-time monitoring and debugging. The jobs are color-hashed to make it easy to visually track a job across the state changes. Clicking on Details for each job will provide details and actions that are possible to take, following tables 1.1 and 1.2 above.

!
Fig 4.1 - Kanban View

!

Fig 4.2 - Job details in the Kanban View

Standard (Segmented) View

This view is expected to be most useful during manual reconciliation. The information provided is the same as in the kanban view but it’s more appropriate for inspecting jobs in a specific state. To that end, the actions are available on the right side of the job details in both collapsed and expanded states.

!

Fig 4.3 - Job list and details details in the Standard
Scenario

As an abstract scenario, let’s see what can be done in a flow where the job sequence is critical. To mitigate processing errors, processor downtime and other failures:
Suspend on error: use the option to suspend the channel or semaphore upon an error or timeout. This will minimize errors while preserving the job sequence.
Reconcile: Use the API or the new UI console to review and requeue, reschedule or archive the erroring messages, alongside any adjustments in the downstream systems.
Activate the channel processing to resume from the point it stopped, considering the changes performed during the reconciliation.
Conclusion

For more details on the new API and configuration options please head to the docs.
We hope users find these features a key enabler on their journey to stable and resilient distributed systems.

Announcing WebSemaphore V1 and new features

1. Intro

2. Fine-grained job state control

Job states

State control API

High-resolution automated job timeouts

Archive

Improved UI

Kanban View

!

Standard (Segmented) View

Scenario

Conclusion