bgromov · November 3, 2022 10:10 · Aug 20, 2021 · Aug 20, 2021 · Aug 20, 2021 · Aug 20, 2021
diff --git a/README.md → pandas-query.md b/README.md → pandas-query.md
diff --git a/README.md b/README.md
@@ -33,7 +33,7 @@ df.query('a.apply(lambda x: x[0]) > 5') # NotImplementedError: 'Lambda' nodes ar
 Luckily, we can solve this problem with [custom accessors](https://pandas.pydata.org/pandas-docs/stable/development/extending.html). We can extend the `pandas.Series` and extract necessary information from values of complex types.
 
 
-First, we have to register, an accessor:
+First, we have to register an accessor:
 
 ```python
 

diff --git a/README.md b/README.md
@@ -1,60 +1,77 @@
 ## Pandas: `DataFrame.query()` for complex data types
 
-This method allows one to query data from pandas DataFrame where cells are of complex data types, e.g. `object`, `list`, `dict`.
+Consider the following pandas `DataFrame`:
 
-1. Register custom `pandas.Series` accessor:
+```python
+df = pd.DataFrame([[[1, 2, 3]], [[4, 5, 6]], [[7, 8, 9]], [[0, 1, 2]]], columns=['a'])
+```
 
-    ```python
+```
+           a
+0  [1, 2, 3]
+1  [4, 5, 6]
+2  [7, 8, 9]
+3  [0, 1, 2]
+```
 
-    import pandas as pd
+How do we query all rows where the first element is greater than 5?
 
-    @pd.api.extensions.register_series_accessor('list')
-    class ListAccessor:
-        def __init__(self, pandas_obj):
-            self._obj = pandas_obj
+One of the ways would be to use `DataFrame.query()`. Unfortunatelly, there is no way to access individual elements of cell values, i.e., we can't do this:
 
-        def __getitem__(self, index):
-            return self._obj.apply(lambda x: x[index])
-    ```
+```python
+df.query('a[0] > 5') # TypeError: '>' not supported between instances of 'list' and 'int'
+```
 
-2. Use it with `query` or `eval`:
+Clearly, that happens because the item accessor selects a row number 0 which is then compared to a scalar number.
 
-    Create a `DataFrame`:
+Another approach would be to extract necessary data using a lambda function within `apply()` method; however, it is not supported yet:
 
-    ```python
-    df = pd.DataFrame([[[1, 2, 3]], [[4, 5, 6]], [[7, 8, 9]], [[0, 1, 2]]], columns=['a'])
-    ```
+```python
+df.query('a.apply(lambda x: x[0]) > 5') # NotImplementedError: 'Lambda' nodes are not implemented
+```
 
-    ```
-               a
-    0  [1, 2, 3]
-    1  [4, 5, 6]
-    2  [7, 8, 9]
-    3  [0, 1, 2]
-    ```
+Luckily, we can solve this problem with [custom accessors](https://pandas.pydata.org/pandas-docs/stable/development/extending.html). We can extend the `pandas.Series` and extract necessary information from values of complex types.
 
-    Eval:
 
-    ```python
-    df.eval('a.list[1]')
-    ```
+First, we have to register, an accessor:
 
-    ```
-    0    2
-    1    5
-    2    8
-    3    1
-    Name: a, dtype: int64
-    ```
+```python
 
-    Query:
+import pandas as pd
 
-    ```python
-    df.query('a.list[2] > 4')
-    ```
+@pd.api.extensions.register_series_accessor('list')
+class ListAccessor:
+    def __init__(self, pandas_obj):
+        self._obj = pandas_obj
 
-    ```
-               a
-    1  [4, 5, 6]
-    2  [7, 8, 9]
-    ```
+    def __getitem__(self, index):
+        return self._obj.apply(lambda x: x[index])
+```
+
+Then, we can immediately use it with `query()` or `eval()`:
+
+Eval:
+
+```python
+df.eval('a.list[1]')
+```
+
+```
+0    2
+1    5
+2    8
+3    1
+Name: a, dtype: int64
+```
+
+Query:
+
+```python
+df.query('a.list[2] > 4')
+```
+
+```
+           a
+1  [4, 5, 6]
+2  [7, 8, 9]
+```
diff --git a/README.md b/README.md
@@ -1,3 +1,5 @@
+## Pandas: `DataFrame.query()` for complex data types
+
 This method allows one to query data from pandas DataFrame where cells are of complex data types, e.g. `object`, `list`, `dict`.
 
 1. Register custom `pandas.Series` accessor:

diff --git a/README.md b/README.md
@@ -2,57 +2,57 @@ This method allows one to query data from pandas DataFrame where cells are of co
 
 1. Register custom `pandas.Series` accessor:
 
-```python
+    ```python
 
-import pandas as pd
+    import pandas as pd
 
-@pd.api.extensions.register_series_accessor('list')
-class ListAccessor:
-    def __init__(self, pandas_obj):
-        self._obj = pandas_obj
+    @pd.api.extensions.register_series_accessor('list')
+    class ListAccessor:
+        def __init__(self, pandas_obj):
+            self._obj = pandas_obj
 
-    def __getitem__(self, index):
-        return self._obj.apply(lambda x: x[index])
-```
+        def __getitem__(self, index):
+            return self._obj.apply(lambda x: x[index])
+    ```
 
 2. Use it with `query` or `eval`:
 
-Create a `DataFrame`:
-
-```python
-df = pd.DataFrame([[[1, 2, 3]], [[4, 5, 6]], [[7, 8, 9]], [[0, 1, 2]]], columns=['a'])
-```
-
-```
-           a
-0  [1, 2, 3]
-1  [4, 5, 6]
-2  [7, 8, 9]
-3  [0, 1, 2]
-```
-
-Eval:
-
-```python
-df.eval('a.list[1]')
-```
-
-```
-0    2
-1    5
-2    8
-3    1
-Name: a, dtype: int64
-```
-
-Query:
-
-```python
-df.query('a.list[2] > 4')
-```
-
-```
-           a
-1  [4, 5, 6]
-2  [7, 8, 9]
-```
+    Create a `DataFrame`:
+
+    ```python
+    df = pd.DataFrame([[[1, 2, 3]], [[4, 5, 6]], [[7, 8, 9]], [[0, 1, 2]]], columns=['a'])
+    ```
+
+    ```
+               a
+    0  [1, 2, 3]
+    1  [4, 5, 6]
+    2  [7, 8, 9]
+    3  [0, 1, 2]
+    ```
+
+    Eval:
+
+    ```python
+    df.eval('a.list[1]')
+    ```
+
+    ```
+    0    2
+    1    5
+    2    8
+    3    1
+    Name: a, dtype: int64
+    ```
+
+    Query:
+
+    ```python
+    df.query('a.list[2] > 4')
+    ```
+
+    ```
+               a
+    1  [4, 5, 6]
+    2  [7, 8, 9]
+    ```
diff --git a/README.md b/README.md
@@ -17,31 +17,42 @@ class ListAccessor:
 
 2. Use it with `query` or `eval`:
 
+Create a `DataFrame`:
+
 ```python
 df = pd.DataFrame([[[1, 2, 3]], [[4, 5, 6]], [[7, 8, 9]], [[0, 1, 2]]], columns=['a'])
+```
 
+```
            a
 0  [1, 2, 3]
 1  [4, 5, 6]
 2  [7, 8, 9]
 3  [0, 1, 2]
+```
 
+Eval:
 
+```python
 df.eval('a.list[1]')
+```
 
+```
 0    2
 1    5
 2    8
 3    1
 Name: a, dtype: int64
+```
 
+Query:
 
+```python
 df.query('a.list[2] > 4')
+```
 
+```
            a
 1  [4, 5, 6]
 2  [7, 8, 9]
 ```
-
-
-
diff --git a/README.md b/README.md
@@ -0,0 +1,47 @@
+This method allows one to query data from pandas DataFrame where cells are of complex data types, e.g. `object`, `list`, `dict`.
+
+1. Register custom `pandas.Series` accessor:
+
+```python
+
+import pandas as pd
+
+@pd.api.extensions.register_series_accessor('list')
+class ListAccessor:
+    def __init__(self, pandas_obj):
+        self._obj = pandas_obj
+
+    def __getitem__(self, index):
+        return self._obj.apply(lambda x: x[index])
+```
+
+2. Use it with `query` or `eval`:
+
+```python
+df = pd.DataFrame([[[1, 2, 3]], [[4, 5, 6]], [[7, 8, 9]], [[0, 1, 2]]], columns=['a'])
+
+           a
+0  [1, 2, 3]
+1  [4, 5, 6]
+2  [7, 8, 9]
+3  [0, 1, 2]
+
+
+df.eval('a.list[1]')
+
+0    2
+1    5
+2    8
+3    1
+Name: a, dtype: int64
+
+
+df.query('a.list[2] > 4')
+
+           a
+1  [4, 5, 6]
+2  [7, 8, 9]
+```
+
+
+