Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
802 views
in Technique[技术] by (71.8m points)

sql - Pass multiple sets or arrays of values to a function

I'm writing a PL/pgSQL function in PostgreSQL 9.3.10 to return who has attended certain classes/sessions from the following table:

Attendance
+-------+---------+---------+
| Class | Section |  Name   |
+-------+---------+---------+
|    1  |      1  | Amy     |
|    1  |      1  | Bill    |
|    1  |      2  | Charlie |
|    1  |      2  | Dan     |
|    2  |      1  | Emily   |
|    2  |      1  | Fred    |
|    2  |      2  | George  |
+-------+---------+---------+

What I want to do is, given a array of class/section id pairs (int[][]), return all people who are in those classes/sections. For example my_func(ARRAY[[1,1],[2,2]]) returns:

+-------+---------+---------+
| Class | Section |  Name   |
+-------+---------+---------+
|    1  |      1  | Amy     |
|    1  |      1  | Bill    |
|    2  |      2  | George  |
+-------+---------+---------+

If I knew the pairs beforehand, it would be a simple:

SELECT * FROM attendance 
WHERE ((class = 1 AND section = 1) OR (class = 2 AND section = 2));

Instead, the pairs will be a parameter of the function.

Right now, the only way I can think of doing this is to have the function essentially build an SQL query string by appending a bunch of WHERE clauses to the end of the query and then calling EXECUTE. Is there a better way to get my result?

EDIT: I implemented the suggestion @Erwin's suggestion and am currently able to get the results I want. Unfortunately, it appears as though it is fairly slow. Here is the function I am running:

CREATE OR REPLACE FUNCTION public.get_attendance(int[])
  RETURNS  TABLE(
    class_c int,
    section_c int
  )
AS
$BODY$
  BEGIN
    RETURN QUERY
      SELECT class, section
      FROM generate_subscripts($1, 1) as i 
      INNER JOIN attendance ON attendance.class = $1[i][1]
                            AND  attendance.section = $1[i][2];

  END;
$BODY$
LANGUAGE plpgsql VOLATILE;

Querying it like so:

SELECT *  FROM get_attendance(ARRAY[[1,15],[2,15],[3,8]]);

And I am getting the following EXPLAIN ANALYZE output

Merge Join  (cost=60.26..50139.72 rows=30840 width=8) (actual time=44.174..142.100 rows=25290 loops=1)
  Merge Cond: ((attendance.class = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1])) AND (attendance.section = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])))
  ->  Index Only Scan using class_section_idx on attendance  (cost=0.43..43372.25 rows=1233588 width=8) (actual time=0.009..86.625 rows=1145046 loops=1)
        Heap Fetches: 0
  ->  Sort  (cost=59.83..62.33 rows=1000 width=4) (actual time=0.010..0.757 rows=10031 loops=1)
        Sort Key: (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1]), (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])
        Sort Method: quicksort  Memory: 25kB
        ->  Function Scan on generate_subscripts i  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.006..0.007 rows=3 loops=1)

The problem is that the query is scanning through all attendances in the attendance table without filtering them until the join. Is there any way to fix this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can achieve that with a simple SQL function. Key feature is the function generate_subscripts():

CREATE OR REPLACE FUNCTION f_attendance(_arr2d int[])
  RETURNS SETOF attendance AS
$func$
   SELECT a.*
   FROM   generate_subscripts($1, 1) i
   JOIN   attendance a ON a.class   = $1[i][1]
                      AND a.section = $1[i][2]
$func$  LANGUAGE ROWS 10 sql STABLE;

Call:

SELECT * FROM f_attendance(ARRAY[[1,1],[2,2]]);

Or the same with an array literal - which is more convenient in some contexts, especially with prepared statements:

SELECT * FROM f_attendance('{{1,1},{2,2}}');

The function always expects a 2D array. Even if you pass a single pair, nest it:

SELECT * FROM f_attendance('{{1,1}}');

Audit of your implementation

  1. You made the function VOLATILE, but it can be STABLE. Per documentation:

    Because of this snapshotting behavior, a function containing only SELECT commands can safely be marked STABLE.

    Related:

  2. You also use LANGUAGE plpgsql instead of sql, which makes sense if you execute the function multiple times in the same session. But then you must also make it STABLE or you lose that potential performance benefit. The manual once more:

    STABLE and IMMUTABLE functions use a snapshot established as of the start of the calling query, whereas VOLATILE functions obtain a fresh snapshot at the start of each query they execute.

  3. Your EXPLAIN output shows an Index Only Scan, not a sequential scan like you suspect in your comment.

  4. There is also a sort step in your EXPLAIN output that does not match the code you show. Are you sure you copied the right EXPLAIN output? How did you obtain it anyway? PL/pgSQL functions are black boxes to EXPLAIN. Did you use auto_explain? Details:

  5. The Postgres query planner has no idea how many array elements the passed parameter will have, so it is hard to plan the query and it may default to a sequential scan (depending on more factors). You can help by declaring the expected number of rows. If you typically don't have more than 10 items add ROWS 10 like I did now above. And test again.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...