Streaming Expressions in Apache Solr
The Structure Query Language (SQL) engine is built on top of Solr’s Streaming API or streaming expressions. Streaming expression provides a simple and powerful stream processing language for Solr cloud and Its support for parallel relational algebra and real-time map-reduce.
Distributed Joins: Streaming expressions are added to distributed joins.
- Inner Join
- Left Outer Join
- Hash Join
- Outer Hash Join
innerJoin(search(collection1, q=*:*, fl="fieldP, fieldQ, fieldR", ...),
search(collection2, q=*:*, fl=”fieldP, fieldM, fieldN”, …), on=”fieldP=fieldP”) Rolling streaming expression: It is a group of the common field value tuple. Example: rollup(search(collection1, qt=”/export” q=”*:*”, fl=”id,course,price”, sort=”course asc”), over=”course”), count(*), max(price))
Facet streaming expression: It pushes down the computation using json.
q="*:*", buckets="course", bucketSorts="count(*) desc", bucketSizeLimit=1000, count(*), sum(price), max(popularity))
There are many available functions.
- Continuous push streaming
- Continuous pull streaming
- Request/Response streaming
- MapReduce is shuffling aggregation
- Pushdown faceted aggregation
- Parallel relational algebra (distributed joins, intersections, unions, complements)
- Publish/subscribe messaging
- Distributed graph traversal
This blog will help you get a better understanding of Solr + Hadoop = Big Data Love