Commit 68f536bf authored by Tibo's avatar Tibo

readme

parent b89ebe3d
Pipeline #2133 passed with stage
in 15 seconds
......@@ -4,4 +4,92 @@
[![pipeline status](https://gitlab.cylab.be/tibo/php-spark/badges/master/pipeline.svg)](https://gitlab.cylab.be/tibo/php-spark/commits/master)
[![coverage report](https://gitlab.cylab.be/tibo/php-spark/badges/master/coverage.svg)](https://gitlab.cylab.be/tibo/php-spark/commits/master)
A wrapper around arrays that mimics the MapReduce methods of Apache Spark.
**php-spark** is a wrapper around arrays that mimics the MapReduce API of Apache Spark.
```php
$data = new Dataset([1, 2, 3, 4]);
$result = $data
->map(function ($v) {
return 2 * $v;
})
->reduce(function ($v, $agg) {
return $agg + $v;
});
$result == 20;
```
php-spark is NOT a PHP driver for Apache Spark (and I wish this would exist).
## Dataset
A dataset is an **immutable array of data**. It is the equivalent of
**Spark RDD (Resilient Distributed Dataset)**.
```php
use Cylab\Spark\Dataset;
$d = new Dataset([1, 2, 3, 4]);
var_dump($d->collect());
```
## Map
Map applies the provided function to all elements in the dataset and
returns a new dataset containing the result of the map operation.
```php
$d2 = $d->map(function ($v) { return 2 * $v; });
```
## Reduce
The reduce function you provide must take two parameters: the current value
and the aggregated value.
```php
use Cylab\Spark\Dataset;
$d = new Dataset([1, 2, 3, 4]);
$result = $d->reduce(function ($v, $agg) {
return $agg + $v;
});
$result == 10;
```
## Tuple
Some methods expect the dataset to contain a list of <key, value> tuples.
```php
use Cylab\Spark\Dataset;
use Cylab\Spark\Tuple;
$strings = ["foe", "bar", "foe"];
$d = new Dataset ($strings);
$d2 = $d->map(function($s) { return new Tuple($s, 1); });
```
## ReduceByKey
For this method to work, the input dataset must be a list of <key, value> tuples.
The reduce function is then applied to all elements with the same key.
```php
$counts = $d2->reduceByKey(function ($count, $sum) {
return $sum + $count;
});
```
## First
Get the first element of a dataset.
```php
// Tuple<"foe", 2>
var_dump($counts->first());
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment