The string you pass to SQLContext, it gets evaluated in the scope of the SQL environment. It is not able to capture the closure.
In order to pass a variable, you'll have to do it explicitly using string formatting:
df = sc.parallelize([(1, "foo"), (2, "x"), (3, "bar")]).toDF(("k", "v"))
df.registerTempTable("df")
sqlContext.sql("SELECT * FROM df WHERE v IN {0}".format(("foo", "bar"))).count()
## 2
It is also obvious that, using something like this won’t be your priority in a "real" SQL environment due to security considerations but it shouldn't matter here.
In practice DataFrame DSL is a must choice when you want to create dynamic queries:
from pyspark.sql.functions import col
df.where(col("v").isin({"foo", "bar"})).count()
## 2
It is easy to build and compose and handles all details of HiveQL / Spark SQL for you.
If you want to know more about Spark, then do check out this awesome video tutorial: