This version includes the following modules: Manager 1.6 / Crossdata 1.5 / Cassandra 2.2.5
This version includes the following modules: Decision 1.1 / Ingestion 0.6 / Sparta 0.11
Version 0.9.0
Version 1.9


Latest version.
Created on 04/03/2016

This version includes the following modules: Manager 1.6 / Decision 1.0 / Crossdata 1.2 / Viewer 0.15 / Ingestion 0.6 / Sparta 0.11 / Cassandra 2.2.5

Created on 04/09/2015

This version includes the following modules: Manager 1.5 / Decision 0.9 / Crossdata 1.1 / Viewer 0.14 / Ingestion 0.5 / Sparta 0.8 / Cassandra 2.2.4

Created on 04/04/2015

This version includes the following modules: Manager 1.4 / Decision 0.8 / Crossdata 1.1 / Viewer 0.11 / Ingestion 0.4 / Sparta 0.7 / Cassandra 2.1



Cassandra’s indexing functionality has been extended to provide near real-time search like in ElasticSearch or Solr, including full text search capabilities and free multivariable search. It is achieved through a Lucene-based implementation of Cassandra’s secondary indexes, where each node of the cluster indexes its own data. Stratio cassandra improves cassandra’s cql expressivity with lucene’s syntax.

Lucene’s search technology integration with cassandra provides:

  • Big data full text search
  • Relevance scoring and sorting
  • General top-k queries
  • Complex boolean queries (and, or, not)
  • Near real-time search
  • CQL3 support
  • Wide rows support
  • Partition and cluster composite keys support
  • Support for indexing columns part of primary key
  • Self contained distribution



Crossdata is a distributed framework and a fast and general-purpose computing system powered by Apache Spark. It unifies the interaction with different sources supporting multiple datastore technologies.
  • High availability and scalability.
  • Speed up queries resolutions through native access.
  • Supports batch and streaming queries.
  • Metadata discovery.
  • Improves and extends Apache Spark capabilities.
  • Deploy Crossdata as a Spark library.


  • Unifies stream and batch processing using a common language
  • Ability to access different datastore technologies
  • Optimised connectors with native access for Cassandra, MongoDB, Elasticsearch
  • Users can easily add new connectors for native access
  • Extends existing datastore capabilities (joins, group by…)
  • Easy to use SQL-like language
  • Java/Scala API and Query Builder
  • ODBC/JDBC drivers to link with existing BI tools
  • Distributed scalable fault-tolerant P2P architecture


    Interactive CEP

    Complex event processing, or CEP, is the event processing that combines data from multiple sources to infer events or patterns that suggest more complex occurrences. CEP as a technique helps discover complex events by analyzing and correlating other events,Stratio Decision is our complex event engine built on spark streaming.It is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.

    Integration of Drools and Stratio Decision

    What is drools?

    Drools is a Business Rules Management System (BRMS) solution. It provides a core Business Rule Engine (BRE), a web authoring and rules management application (Drools Workbench).

    Drools is a Rule Engine that uses the rule based approach to implement an Expert system. A production rule is a two-part structure using First Order Logic for reasoning over knowledge representation.

    The inference engine matches the rules against the facts (objects) in memory and can match the next set of rules based on the changed facts.

    Benefits of drools & decision

    • Allows Apply powerful business logic in Decision flow.
    • Business logic changes can be applied on live.
    • Web application development and management rules.
    • Business Rules versioning.
    • Project rules management with git.
    • Project rules as a maven artifact.
    • Rules distribution through a maven repository.

    Stream Query Language

    SQL-Like language

    1. Stream Definition Language (SDL).
      • Create, alter or drop a stream, add new queries or remove existing queries.
    2. Stream Manipulation Language (SML)
      • Insert events into a stream and list the existing streams in the engine.
    3. Stream Action Language (SAL).
      • Listen to a stream (Kafka), save the stream to Cassandra or MongoDB (auto-creation of tables), index the stream to ElasticSearch… here you should find useful operations ready to use.
      • Start & stop each action on-demand.
    4. Built-in functions
      • Auditing all the requests in the streaming engine (Cassandra)
      • Statistics (requests per operation, requests per stream…)
      • Failover system (recovering windows, streams and queries from Cassandra)


    With Stratio Decision you can launch ad-hoc queries (even remove them) by using an SQL-like language. Queries let you connect streams or operate with events in a stream in Real-Time. They only start working when you add the query to the engine and these are continuous queries. There are a lot of CEP operators that you can use in your queries:

    • Filtering.
    • Projection.
    • In-built functions.
    • Windows (time and lenght).
    • Join.
    • Event Sequences.
    • Event Patterns.
    • Output rate limiting.
    • Custom windows, custom functions.
      • Example: from sensor_grid #window.length(10) select name, ind, avg(data) as data group by name insert into sensor_grid_avg for current-events>

    Shell & API

    Is Stratio Streaming multi-persistence?

    For sure, we have included ready-to-use actions in the engine that allows you, any time, to start or stop saving all the events in that stream to the persistence of your choice: MongoDB, Cassandra or ElasticSearch.

    The engine takes care about creating keyspaces, tables, collections, indexes or whatever it needs to properly store the events (what’s more, if the stream is changed by an alter request, Stratio Streaming will also change the persistence for you).

    Can I work with temporal windows?

    Time is a first-class citizen in a CEP engine so yes, you can work with temporal windows. Anyway, length windows and others are also supported, and there are a lot of operators for your queries (avg, count, sum, max, min, patterns, sequences, joins…)

    How can I send data to the engine?

    Use the API or the Shell provided by Stratio Streaming. You can send a really BIG amount of events.

    Select version:



    Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

    Its use is not only designed for logs, in fact you can find a myriad of sources, sinks and transformations.

    In addition, a sink could be a final big data storage but also another real-time system (Apache Kafka, Spark Streaming)

    Stratio ingestion is a fork of apache flume (1.6) where you can find:

    • Several bug fixes: Some of them really important, such as unicode support.
    • Several enhancements of Flume’s sources & sinks: ElasticSearch mapper, for example.
    • Custom sources and sinks, developed by Stratio:
      • SNMP (v1, v2c and 3).
      • Redis, Kafka (
      • MongoDB, JDBC and Cassandra.
      • Stratio Streaming (Complex Event Processing engine).
      • REST client, Flume agents stats.
    Select version:



    Stratio Intelligence is the Data Intelligence layer of the Data-centric architecture. The main milestones of Stratio Intelligence are:

    • Two Big Data Science development environments (for R and Python)
    • Two Big Data Science development environments (for R and Python)
    • Integration of Open Source distributed ML libraries
    • Real-time decision making with the trained models
    • Integration with the Stratio Platform
    • Full management of the knowledge life cycle, from development to production
    Select version:



    Stratio Manager is the easiest way to install, manage and monitor all the technology stack related to Spark.

    Choose your cluster size, select which software to install and let Stratio Manager do all the hard work.



    • Installation wizard.
    • EC2 deployment support.
    • On site deployment support.
    • Pre-installation health checker.
    • Platform monitoring.
    • Platform management.
    • Automatic resources assignment.
    • System alarm visualization.
    • Platform activity stream.
    • Remote logs visualization.
    • Remote support.
    Select version:



    Stratio Sparta is a pure spark real-time aggregation engine. With absolutely no coding, you can simultaneously deploy several user-defined aggregation workflows, where you can decide which rollups and dimensions will be applied to the event stream, in real-time.

    Each workflow has its own aggregation policy where you can select which input (Kafka, Flume, Twitter, etc.), output (MongoDB, Cassandra, etc.), event parser functions (decoding, enrichment, normalization), and aggregation functions (time-based, geo-range, hierarchical counting, sum, max, min, count, sumsquares, etc.) will be executed by Sparta.

    Without generating a single line of code, using an easy and friendly interface, you can define your aggregation policies needs, including:

    • Input: Where is the data coming from?
    • Output(s): Where should aggregate data be stored?
    • Dimension(s): With field will you need for your real-time needs?
    • Rollup(s): How do you want to aggregate the dimensions?
    • Transformation(s): Which functions should be applied before aggregation?
    • Save raw data: Do you wat to save raw events?

    Select version:



    Data is worthless if you don’t communicate it correctly. Stratio Viewer was born out of the need to explain the real value of data. Our content-centred approach to data lets you focus on what to you really want to communicate: dashboards, reports, microsites... Stratio Viewer lets you use an array of widgets and a wide catalogue of data sources, based on mature market standards.

    The following video is a short overview of some of the features and capabilities which Viewer offers you:

    Interesting facts

    Using the native connectors you can communicate with any source in its own language (SQL, CQL, Mongo QL, Lucene syntax, Json...). Furthermore, if you don't know the source's dialect you can use our SQL data sources and use your SQL skills to get data from any source.

    Based on OpenSocial

    OpenSocial is a public specification for creating web applications using standard technologies like HTML, CSS, and Javascript. It was originally developed by Google, Myspace and others to standardize common social networking API’s but has evolved into a general platform for building web applications.

    • Standard:
      • Being part of W3C.
      • Portable
      • Supported and used by big companies.
    • Lean:
      • The widgets are defined by XML.
      • It uses web technologies like hml5, js or css.
      • Isolated rendering engine.
      • Communication between widgets is available.
    • Extensible:
      • Easy development.
      • Hight level customization.
      • Different data views.
      • Advanced configuration.



    Apache Sqoop is a tool designed for efficiently transferring bulk data from structured datastores such as relational databases to different places. Sqoop works in batch mode and commonly it's used to import high volumes of data from big databases and store it in a data lake.

    Apache Sqoop works on top of Hadoop but in Stratio we've adapted Sqoop to work also on top of Spark. At this way Sqoop on top of Spark is a perfect tool to import high volumes of data in a efficient way and send it for example to Kafka to be Ingested in your Big Data platform.

    Stratio allows you give the next step and run all the jobs operations that you ran in a Hadoop cluster in a Spark cluster providing you all the Spark beneficts.

    Transfer Data

    The traditional application management system, that is, the interaction of applications with relational database using RDBMS, is one of the sources that generate Big Data. Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure.

    When Big Data storages and analyzers such as MapReduce, Hive, HBase, Cassandra, Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between relational database server and Hadoop’s HDFS.

    Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. It is provided by the Apache Software Foundation.

    Select version:



    From a high level point of view, Stratio goSec is a centralized security to manage fine-grained access control over the Big Data services, such as Apache HDFS, Apache Cassandra, Apache Kafka, Apache Zookeeper and also web applications, such as Stratio Viewer or Stratio Intelligence

    Thanks to Stratio goSec, security administrators can easily manage policies for access to files, folders, topics, databases, tables, or znodes. These policies can be set for individual users or groups and then enforced across the platform.

    Stratio goSec can also manage the audit information collected in authentication or authorization events for deeper control of the environment.

    Stratio goSec currently supports authorization, authentication, auditing, security administration for the following Stratio components:

    • Apache HDFS
    • Apache Cassandra
    • Apache Kafka
    • Apache Zookeeper

    And single sign-on across:

    • Stratio Manager
    • Stratio Viewer
    • Stratio Inteligence
    • Stratio Gosec-Management
    • Stratio Sparta
    • Stratio Workflow Editor.

    This is the high level view of Stratio goSec components:

    Select version:



    Stratio Morphlines add some useful transformations to the defaults Kite SDK.

    Stratio Morphlines consists of several modules:

    • CommonsCalculator: Make operations between fields and/or static numbers.FieldFilter: Filter fields including or excluding.ReadXml: Read xml from header or body, and apply XPath expression.RelationalFilter: Drop fields if they don’t accomplish a relational condition.Rename: Rename a field.TimeFilter: Filter a time field between specified dates.HeadersToBody: The headersToBody is a command that write in your _attachment_body field all your headers but excluded in JSON format.ContainsAnyOf: Command that succeeds if all field values of the given named fields contains any of the given values and fails otherwise. Multiple fields can be named, in which case a logical AND is applied to the results.
    • GeoIP: Command that works as the kite one. It will save the iso code and the longitude-latitude pair in two header fields.
    • GeoLocateAirports: Get the longitude and latitude of an airport from its airport code (from origin and destination).
    • NLP: Command that detects the language of a specific header and puts the ISO_639-1 code into another header.
    • WikipediaCleaner: Command that cleans the wikipedia markup from a text.
    • CheckpointFilter: Get the last checkpoint value by parametrized handler and filter records by paramentrized field too. Periodically update checkpoints values.
    • LDAP: Extract RDN’s from an LDAP String into separated headers.
    • Referer Parser: Is based on Snowplow Referer Parser with some improvements. It’s able to parse both organic links as paid links.
    Select version: