You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

pyspark 에서 반경검색을 빠르고 효율적으로 사용하는방법



사용모듈

from pyspark.sql.functions import udf
from geopy.distance import great_circle
from pyspark.sql.functions import lit, struct

@udf("float")
def great_circle_udf(x, y):
    return great_circle(x, y).kilometers



point = struct(lit(37.5680423), lit(126.8264086))
nearNaver = tbl_NHotel.filter(great_circle_udf(point, struct(tbl_NHotel.latitude, tbl_NHotel.longitude)) < 0.05 )
nearNaver.show(10)


결과
+-------+-----+-------------+----------+-----------+-------+-------+
|     id|price|    goodsname|  latitude|  longitude|region1|region2|
+-------+-----+-------------+----------+-----------+-------+-------+
|3054798|86478|라마다 앙코르 서울 마곡|37.5680423|126.8264086|     서울|    강서구|
+-------+-----+-------------+----------+-----------+-------+-------+

  • No labels