Table of Contents
Project Release v3.0
Project Release v3.1
Project Release v3.x
Run Examples
Getting Started
For this project, you will extend your previous project to support multithreading using a custom-made conditional read/write lock to create a thread-safe inverted index and a custom-made work queue to manage worker threads. The build process should be multithreaded such that a worker thread processes a single file. The search process should be multithreaded such that a worker thread processes a single multi-word query line.
To do this, you should create new classes (or extend existing classes) to support multithreading. DO NOT REMOVE YOUR SINGLE THREADING CODE, as you still need to support single threaded building and searching the index.
Do NOT use any of the classes in the java.util.concurrent
package and do NOT use the Stream.parallel
method for the multithreaded code.
Project Release v3.0
The first release of this project must support multithreaded build using a custom-made conditional read/write lock to create a thread-safe inverted index and a custom-made work queue to manage worker threads. Specifically:
-
Your
main
method must process the command-line argument-threads [num]
where the flag-threads
enables multithreading, and indicates the next optional argument[num]
is the number of worker threads to use. If the[num]
argument is not provided, not a number, or less than1
, use5
as the default number of worker threads. -
If multithreading is enabled, the inverted index must be made thread-safe using a custom-made conditional read/write lock. The conditional read/write lock must support concurrent read operations, non-concurrent write and read/write operations, and allow an active writer to acquire additional read or write locks as necessary (i.e. reentrant write lock).
-
If multithreading is enabled, your code must use a work queue to manage threads. The work queue must support the ability to automatically track pending (or unfinished) work (or tasks), and provide a method that waits until there is no more pending work. The work queue should also support the ability to shutdown gracefully, and should be gracefully shutdown after all building and searching operations are complete.
-
If multithreading is enabled, your code must support multithreaded building of the thread-safe inverted index using a work queue. Each worker thread should process one file at a time (i.e. create one “task” per file).
-
The output requirements are the same as the previous project. As before, your code should only generate output files if the necessary flags are provided.
Your code should support all of the functionality from the previous project as well.
If the -threads
flag is not provided, then multithreading should not be enabled. A thread-safe inverted index should not be initialized, a work queue should not be initialized, no worker threads should be created, and the project should execute with a single main thread exactly as previous projects.
Project v3.0 Tests
This grade is earned by creating a release that passes the ThreadBuildTests.java
tests locally and the tests tagged with test-v3.0
remotely using GitHub Actions. Your code must also past the Project v2.1 Tests tests from the previous project as well. Once passing, use the “Request Project Tests Grade” issue template to request a grade for the passing release.
It is difficult to detect whether your code is multithreading as required. The tests can only detect is whether your code initializes a work queue, not whether that work queue is properly used. Submitting code for a test grade that is not multithreading using a work queue and conditional wread/write lock will result in a -5% to -10% deduction to the test score. You are strongly encouraged to use logging to verify tasks are being created and run as intended.
Project v3.0 Review
This grade is earned by attending a code review appointment with the instructor. Before requesting code review, your code should have professional style and documentation and integrated feedback from all previous code reviews.
Project Release v3.1
The second release of this project must modify the existing functionality to also support multithreaded search. It must also be faster than the single-threaded implementation. Specifically:
-
If multithreading is enabled, your code must support multithreaded searching of the thread-safe inverted index using a work queue. Each worker thread should process one multi-word query line at a time (i.e. create one task per line).
-
If multithreading is enabled, the multithreaded building and searching operations must be at least 1.1 times faster than the single-threaded implementations.
Your code should support all of the functionality from the previous project and release as well.
Project v3.1 Tests
This grade is earned by creating a release that passes the ThreadSearchTests.java
tests locally and the tests tagged with test-v3.1
remotely using GitHub Actions. Your code must also past the Project v3.0 Tests tests from the previous release as well. Once passing, use the “Request Project Tests Grade” issue template to request a grade for the passing release.
This release also has a speedup requirement. The multithreaded functionality must be 1.1 times faster than the single-threaded functionality. The benchmark tests are primarily located in the ThreadBenchTests.java
test class tagged with test-v3.1
in the code.
Project v3.1 Review
This grade is earned by attending a code review appointment with the instructor. Before requesting code review, your code should have professional style and documentation and integrated feedback from all previous code reviews.
Project Release v3.x
All additional releases of this project must maintain the existing functionality and improve upon the design and efficiency of the code.
Project v3.2 Review
All v3.2+ code reviews have a more strict speedup requirement. The multithreaded functionality must be 1.5 times faster than the single-threaded functionality. If you are unable to reach this speedup on GitHub Actions, reach out on Piazza for help before requesting a code review.
Once you are at this point in project 3, you should consider starting work on the next project in a separate branch.
Project v3.x Design
This grade is earned by passing the code review process with the instructor. This takes approximately 3 to 4 code reviews total. The final project release must be 1.7 times faster than the single-threaded functionality when run on GitHub Actions.
Run Examples
The following are a few examples (non-comprehensive) to illustrate the usage of the command-line arguments that can be passed to your Driver
class via a “Run Configuration” in Eclipse, assuming you set the working directory to the project-tests
directory.
Consider the following example:
-text "input/text/simple/" -query "input/query/simple.txt" -results actual/search-exact-simple.json -threads 3
The above arguments behave the same as project 2, except use 3
worker threads in a work queue to multithread. For v3.0 releases, this should multithread building only. For v3.1+ releases, this should multithread both building and searching.
-text "input/text/simple/" -query "input/query/simple.txt" -results actual/search-exact-simple.json -threads
The above arguments are nearly the same, except use the default of 5
worker threads.
-text "input/text/simple/" -query "input/query/simple.txt" -threads
The above arguments are similar, except it does NOT produce any file output. The code should still build the index and calculate the search results! This is important for benchmarking your code.
-text "input/text/simple/" -query "input/query/simple.txt" -results actual/search-exact-simple.json
This should behave exactly the same as project 2, using single-threading without any worker threads.
Getting Started
The following sections may be useful for getting started on this project.
You should NOT wait until you have covered all of the associated homework assignments and lecture content to start the project. You should develop the project iteratively as you progress throughout the semester, integrating concepts one at a time into your project code.
Related Content
The following homework assignments and lectures may be useful for this project:
-
The LoggerSetup homework is useful for learning how to set up and configure
log4j2
, which will be helpful when it comes to debugging multithreaded code. -
The MultiReaderLock homework is useful for creating a custom-made conditional read/write lock. The conditional lock class can be used directly for your project. It also illustrates how to use a conditional lock to make a data structure class thread-safe.
-
The PrimeFinder homework is useful for creating a work queue that tracks pending work. This work queue class can be used directly for your project. It also illustrates how to use this work queue and create tasks for non-recursive problems.
-
The synchronization lecture code illustrates how to use a conditional read/write lock to make a data structure class thread-safe.
-
The work queues lecture code illustrates how to use a work queue and create tasks for recursive problems. If your approach is not recursive, this example might not be helpful for this project.
You can modify homework assignments and lecture code as necessary for this project. However, make sure your code is passing all of the tests and you understand the concepts before using the code.
Suggestions
Do not start on this project until you understand the multithreading code shown in class. If you are stuck on the code shown in class, we are here to help!
Your goal should be to get to testable code as quickly as possible first, and then developing iteratively to pass the functionality tests. One possible breakdown of tasks are:
-
Configure
log4j2
add debug messages in your code. Once you are certain a class is working, disable debug messages for that class in yourlog4j2.xml
configuration file. -
Extend your previous inverted index to create a thread-safe version using a custom-made conditional read/write lock.
-
Create new code to build the index using a work queue (creating one task per file). Make sure your code still passes the tests.
-
Test your code in a multitude of settings and with different numbers of threads. Some issues will only come up occasionally, or only on certain systems.
-
Test your code with logging enabled. Then, test your code with logging completely disabled. Your code will run faster without logging, which sometimes causes some concurrency problems.
Do not worry about efficiency until after your first code review.
It is important to get started early so you have plenty of time to think about how you want to approach the project and start coding iteratively. Planning to complete the code in too large of a chunk is a recipe to get stuck and fall behind!