Python with PostgreSQL

JC0133

Senior member
Nov 2, 2010
201
1
76
Looking for some guidance/tutorials for using Python with PostgreSQL. Have a few assignments coming up using this.

I need to do stuff like Horizontal and Vertical fragmentation of a relation(table), parallel sort and join, etc
 

piokos

Senior member
Nov 2, 2018
554
206
86
Looking for some guidance/tutorials for using Python with PostgreSQL. Have a few assignments coming up using this.

I need to do stuff like Horizontal and Vertical fragmentation of a relation(table), parallel sort and join, etc
First thought: IT (database-centric) studies could help, but I assume that's already something you're trying out...

What exactly do you mean by "need to do"?

The things you've mentioned are not Python or Postgre specific.
Horizontal and vertical fragmentation is a concept in distributed databases - ways to subset large data tables (by tuples or attributes)
Parallel joins and sorts are just that - parallel algorithms for those operations (you can google some examples).

What would that "etc" mean?
The rest of human knowledge about programming and databases? :-D
 

JC0133

Senior member
Nov 2, 2010
201
1
76
I apologize. I guess my questions wasn't very clear or maybe you did answer. I have an assignment where i am doing parallel sort/join with python and PostgreSQL using the pyrthon psycopg2 library. My professor give use this assignment and talked about parallel programming from a high level but no coding examples. I guess I was saying I have no clue how to do parallel programming. And what I youtube made me unsure if it was actually parallel programming. So I was asking someone actually knows how to do it to please post a youtube link or something so I can study it and try and do it. Or post a programming example. I can post my assignment below. If someone understands that maybe you can explain it to me with some programming examples. I have been coding for about 2 years so still very new.

The required task is to build a generic parallel sort and parallel join algorithm.

1. Implement a Python function ParallelSort() that takes as input: (1) InputTable stored in a PostgreSQL database, (2) SortingColumnName the name of the column used to order

the tuples by. ParallelSort() then sorts all tuples (using five parallelized threads) and stores the sorted tuples for in a table named OutputTable (the output table name is passed to the function). The OutputTable contains all the tuple present in InputTable sorted in ascending order.


Function Interface:

ParallelSort (InputTable, SortingColumnName, OutputTable, openconnection) InputTable – Name of the table on which sorting needs to be done. SortingColumnName – Name of the column on which sorting needs to be done, would be either of type integer or real or float. Basically Numeric format. Will be Sorted in Ascending order.

OutputTable – Name of the table where the output needs to be stored.

openconnection – connection to the database.


2. Implement a Python function ParallelJoin() that takes as input: (1) InputTable1 and InputTable2 table stored in a PostgreSQL database, (2) Table1JoinColumn and Table2JoinColumn that represent the join key in each input table respectively. ParallelJoin() then joins both InputTable1 and InputTable2 (using five parallelized threads) and stored the resulting joined tuples in a table named OutputTable (the output table name is passed to the function). The schema of OutputTable should be similar to the schema of both InputTable1 and InputTable2 combined.


Function Interface:

ParallelJoin (InputTable1, InputTable2, Table1JoinColumn, Table2JoinColumn, OutputTable, openconnection)

InputTable1 – Name of the first table on which you need to perform join.

InputTable2 – Name of the second table on which you need to perform join. Table1JoinColumn – Name of the column from first table i.e. join key for first table. Table2JoinColumn – Name of the column from second table i.e. join key for second table.

OutputTable - Name of the table where the output needs to be stored.

openconnection – connection to the database.