RDD.
distinct
Return a new RDD containing the distinct elements in this RDD.
New in version 0.7.0.
the number of partitions in new RDD
RDD
a new RDD containing the distinct elements
See also
RDD.countApproxDistinct()
Examples
>>> sorted(sc.parallelize([1, 1, 2, 3]).distinct().collect()) [1, 2, 3]