Meet Jitsu 2.0: The Modern, Open-Source Segment Alternative
In today's data-driven world, the need for efficient, real-time data collection and processing is paramount for businesses of all sizes. Jitsu 2.0 emerges as a robust, open-source alternative to Segment, offering a fully scriptable data ingestion engine that can be set up in minutes. This detailed blog post delves into the features, installation, configuration, and usage of Jitsu 2.0, providing a comprehensive guide for data teams looking to leverage this powerful tool.
Introduction to Jitsu 2.0
Jitsu 2.0 is a self-hosted, open-source alternative to Segment, designed for collecting event data from various sources such as websites and apps and streaming it to data warehouses or other services. With its real-time data processing capabilities, Jitsu 2.0 provides modern data teams with a powerful tool to set up data pipelines quickly and efficiently.
Learn more: Jitsu Website
Key Features of Jitsu 2.0
Collect Data in a Snap
Jitsu allows for effortless data collection. Implementing Jitsu is as simple as adding a tag to your website or app, akin to the simplicity of Google Analytics or Segment.
Unified Data Without Vendor Lock-In
Jitsu is designed to make your data warehouse the single source of truth. It delivers data to your data warehouses swiftly, ensuring you maintain control over your data without being tied to a specific vendor.
Real-Time Event Streaming
Stream user behavioral data from your apps to your data warehouse in real-time. This ensures your data is ready for analysis in minutes, not hours.
Infinite Developer Flexibility
With Jitsu Functions, developers can modify, filter, or augment events before they are stored in the data warehouse. These functions are executed in a JavaScript runtime environment, providing access to the extensive JavaScript ecosystem, including npm packages, libraries, and key-value storage.
Open Source and Self-Hostable
Jitsu is 100% open-source under the MIT license, giving you the freedom to deploy it on your own infrastructure. For those who prefer a managed solution, Jitsu also offers a cloud version.
Automatic User Identity
Jitsu automatically constructs a real-time identity graph that enhances your data as new information is revealed. This feature eliminates the need for complex SQL queries to match user data.
Clickhouse Included
Jitsu supports various data warehouses like Snowflake, BigQuery, Redshift, Postgres, and MySQL. Additionally, it offers Clickhouse, a free, high-performance columnar database.
Custom Domains
To minimize the impact of ad-blockers on data collection, Jitsu can be deployed on your own subdomain, ensuring reliable data capture.
Technical Architecture
Jitsu's architecture is designed for high performance and scalability. It consists of several core components:
- Collector: Captures event data from various sources and streams it to the data pipeline.
- Router: Routes the captured data to different destinations based on predefined rules.
- Storage: Temporarily stores the event data before it is processed and sent to the final destination.
- Destination: The final data warehouse or service where the data is sent for analysis and processing.
Installation and Setup
Setting up Jitsu 2.0 is straightforward, whether you opt for Docker Compose, a scalable deployment, or the cloud version.
Docker Compose
The fastest way to get Jitsu up and running is via Docker Compose. Here’s how to do it:
# Clone the repository
git clone --depth 1 https://github.com/jitsucom/jitsu
cd jitsu/docker
# Copy .env.example to .env, see instructions at https://docs.jitsu.com/self-hosting/quick-start#edit-env-file
cp .env.example .env
Deploy at Scale
For production deployments, follow the detailed guide available here.
Jitsu Cloud
For those who prefer a managed solution, Jitsu Cloud is available at use.jitsu.com. It is free for up to 200k events per month and includes a free Clickhouse instance.
Configuration
After installing Jitsu, the next step is configuration. Follow the Quick Start Guide to get familiar with Jitsu concepts and browse the Destination Catalog to see where you can send your data.
Sending Events
Jitsu provides multiple ways to send events to your data pipeline, making it flexible and adaptable to different use cases.
HTML Snippet
You can add an HTML snippet to your website to start capturing events:
<html>
<head>
<script async src="https://data.mycompany.com/j.js"></script>
</head>
</html>
React
For React applications, including Next.js, you can use the dedicated React SDK:
import { Jitsu } from '@jitsu/react';
const jitsu = new Jitsu({
key: 'YOUR_WRITE_KEY'
});
jitsu.track('page_view');
NPM Package
Jitsu's NPM package is isomorphic and works in server-side Node.js environments as well:
const jitsu = require('@jitsu/sdk');
const client = jitsu({
key: 'YOUR_WRITE_KEY'
});
client.track('event_name', {
property1: 'value1',
property2: 'value2'
});
HTTP API
Jitsu provides a simple HTTP API for sending events:
curl -X POST https://data.yourcompany.com/api/v1/event \
-H "Content-Type: application/json" \
-d '{
"event_type": "page_view",
"properties": {
"url": "https://yourwebsite.com"
}
}'
Segment Compatible API
If you are already using Segment, Jitsu offers a Segment compatible API, allowing for a seamless transition:
analytics.identify('userId123', {
email: 'user@example.com',
name: 'John Doe'
});
Bulker: The Engine Behind Jitsu
Jitsu is built on Bulker, an open-source data warehouse ingestion engine. Bulker can be used as a standalone tool for those comfortable working with low-level APIs, providing the core functionality that powers Jitsu’s data processing capabilities.
Advanced Features
Real-Time Event Streaming
One of the standout features of Jitsu is its ability to stream event data in real-time. This ensures that user behavioral data from your apps is available for analysis almost instantly, enabling faster insights and decision-making.
Infinite Developer Flexibility
Jitsu Functions allow developers to modify, filter, or augment events before they are stored in the data warehouse. This flexibility is achieved through a JavaScript runtime environment, providing access to a vast ecosystem of JavaScript tools, including npm packages, libraries, and key-value storage.
export default async function(event, { log, props, store }) {
if (event.type === "identify") {
if (await store.set(`signup/${event.traits.email}`)) {
log(`User ${event.traits.email} already signed up`);
} else {
await store.set(`signup/${event.traits.email}`, true);
await fetch(`https://slack.com/api/chat.postMessage`, {
method: "POST",
body: JSON.stringify({
token: props.SLACK_TOKEN,
channel: props.CHANNEL_ID,
text: `Hooray! We have a new user ${event.traits.email}`
})
});
}
}
}
Automatic User Identity
Jitsu automatically constructs a real-time identity graph that augments your data incrementally as new information is uncovered. This feature simplifies the process of maintaining accurate and up-to-date user profiles without complex SQL queries.
Custom Domains
To minimize the impact of ad-blockers on data collection, Jitsu can be deployed on your own subdomain. This ensures reliable data capture and improves the accuracy of your analytics.
Open Source and Community
Jitsu is fully open-source and licensed under the MIT license, promoting transparency and collaboration. The community around Jitsu is active and growing, with contributions from developers worldwide. This open-source nature allows businesses to customize and extend Jitsu according to their specific needs.
For those interested in contributing to Jitsu, the contributing guidelines provide detailed instructions on how to get started.
Case Studies
Jitsu is trusted by industry leaders for its efficiency and reliability in data collection and processing. A notable example is Investing.com, which uses Jitsu to move data faster and streamline their data operations.
Learn more: Investing.com Case Study
Conclusion
Jitsu 2.0 stands out as a powerful, flexible, and efficient alternative to Segment, providing modern data teams with the tools they need to collect, process, and analyze event data in real-time. Its open-source nature, combined with its rich feature set and ease of use, makes it an excellent choice for businesses looking to gain deeper insights into their data without the constraints of vendor lock-in.
Whether you are a startup looking to set up a data pipeline quickly or an enterprise seeking a scalable and customizable data ingestion solution,
Jitsu 2.0 offers the capabilities and flexibility to meet your needs. With its active community and continuous development, Jitsu is poised to remain at the forefront of data ingestion technology.
For more information, visit the Jitsu website, explore the documentation, and join the conversation on Slack.