Commit a73e50da authored by Tibo's avatar Tibo

Add table of implemented methods and list of futre work

parent b6d5f9be
Pipeline #2149 passed with stage
in 21 seconds
# php-spark
[![pipeline status](https://gitlab.cylab.be/tibo/php-spark/badges/master/pipeline.svg)](https://gitlab.cylab.be/tibo/php-spark/commits/master)
[![coverage report](https://gitlab.cylab.be/tibo/php-spark/badges/master/coverage.svg)](https://gitlab.cylab.be/tibo/php-spark/commits/master)
[![pipeline status](https://gitlab.cylab.be/tibo/php-spark/badges/master/pipeline.svg)](https://gitlab.cylab.be/tibo/php-spark/commits/master) [![coverage report](https://gitlab.cylab.be/tibo/php-spark/badges/master/coverage.svg)](https://gitlab.cylab.be/tibo/php-spark/commits/master)
**php-spark** is a wrapper around arrays that mimics the MapReduce API of Apache Spark.
......@@ -35,6 +34,31 @@ $d = new Dataset([1, 2, 3, 4]);
var_dump($d->collect());
```
## Transformations
Transformations return another dataset.
| Method | Description |
| --- | --- |
| map(func) | Return a new distributed dataset formed by passing each element of the source through a function func. |
| distinct() | Return a new dataset that contains the distinct elements of the source dataset. |
| reduceByKey(func) | When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V. |
| groupByKey() | When called on a dataset of (K, V) pairs, returns a dataset of (K, V[]) pairs. |
## Actions
Actions return other types of result.
| Method | Description |
| --- | --- |
| reduce(func) | Aggregate the elements of the dataset using a function func (which takes two arguments and returns one). |
| collect() | Return all the elements of the dataset as an array. |
| count() | Return the number of elements in the dataset. |
| first() | Return the first element of the dataset (similar to take(1)). |
| take(n) | Return an array with the first n elements of the dataset. |
## Map
Map applies the provided function to all elements in the dataset and
......@@ -93,3 +117,19 @@ Get the first element of a dataset.
// Tuple<"foe", 2>
var_dump($counts->first());
```
## Future work
* flatMap
* sample
* union
* intersection
* aggregateByKey
* sortByKey
* join
* cartesian
* takeSample
* takeOrdered
* countByKey
* saveAsObjectFile
* saveAsJsonFile
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment